Samiksha Jaiswal (Editor)

Sample entropy

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states. SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities C i m ( r ) are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn.

Contents

There is a multiscale version of SampEn as well, suggested by Costa and others.

Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity. But it does not include self-similar patterns as ApEn does. For a given embedding dimension m , tolerance r and number of data points N , SampEn is the negative logarithm of the probability that if two sets of simultaneous data points of length m have distance < r then two sets of simultaneous data points of length m + 1 also have distance < r . And we represent it by S a m p E n ( m , r , N ) (or by S a m p E n ( m , r , τ , N ) including sampling time τ ).

Now assume we have a time-series data set of length N = { x 1 , x 2 , x 3 , . . . , x N } with a constant time interval τ . We define a template vector of length m , such that X m ( i ) = { x i , x i + 1 , x i + 2 , . . . , x i + m 1 } and the distance function d [ X m ( i ) , X m ( j ) ] (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We count the number of vector pairs in template vectors of length m and m + 1 having d [ X m ( i ) , X m ( j ) ] < r and denote it by B and A respectively. We define the sample entropy to be

S a m p E n = log A B

Where

A = number of template vector pairs having d [ X m + 1 ( i ) , X m + 1 ( j ) ] < r of length m + 1

B = number of template vector pairs having d [ X m ( i ) , X m ( j ) ] < r of length m

It is clear from the definition that A will always have a value smaller or equal to B . Therefore, S a m p E n ( m , r , τ ) will be always either be zero or positive value. A smaller value of S a m p E n also indicates more self-similarity in data set or less noise.

Generally we take the value of m to be 2 and the value of r to be 0.2 × s t d . Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to 0.2 × s t d for a very large population.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with δ = 1 ,where δ is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its each element specified by the value of δ . And modified template vector is defined as X m , δ ( i ) = x i , x i + δ , x i + 2 × δ , . . . , x i + ( m 1 ) × δ and sampEn can be written as S a m p E n ( m , r , δ ) = log A δ B δ And we calculate A δ and B δ like before.

Implementation

Sample entropy can be implemented easily in many different programming languages. An example written in Matlab can be found here. An example written for R can be found here.

References

Sample entropy Wikipedia