Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states. SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities
C
i
′
m
(
r
)
are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn.
There is a multiscale version of SampEn as well, suggested by Costa and others.
Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity. But it does not include self-similar patterns as ApEn does. For a given embedding dimension
m
, tolerance
r
and number of data points
N
, SampEn is the negative logarithm of the probability that if two sets of simultaneous data points of length
m
have distance
<
r
then two sets of simultaneous data points of length
m
+
1
also have distance
<
r
. And we represent it by
S
a
m
p
E
n
(
m
,
r
,
N
)
(or by
S
a
m
p
E
n
(
m
,
r
,
τ
,
N
)
including sampling time
τ
).
Now assume we have a time-series data set of length
N
=
{
x
1
,
x
2
,
x
3
,
.
.
.
,
x
N
}
with a constant time interval
τ
. We define a template vector of length
m
, such that
X
m
(
i
)
=
{
x
i
,
x
i
+
1
,
x
i
+
2
,
.
.
.
,
x
i
+
m
−
1
}
and the distance function
d
[
X
m
(
i
)
,
X
m
(
j
)
]
(i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We count the number of vector pairs in template vectors of length
m
and
m
+
1
having
d
[
X
m
(
i
)
,
X
m
(
j
)
]
<
r
and denote it by
B
and
A
respectively. We define the sample entropy to be
S
a
m
p
E
n
=
−
log
A
B
Where
A
= number of template vector pairs having
d
[
X
m
+
1
(
i
)
,
X
m
+
1
(
j
)
]
<
r
of length
m
+
1
B
= number of template vector pairs having
d
[
X
m
(
i
)
,
X
m
(
j
)
]
<
r
of length
m
It is clear from the definition that
A
will always have a value smaller or equal to
B
. Therefore,
S
a
m
p
E
n
(
m
,
r
,
τ
)
will be always either be zero or positive value. A smaller value of
S
a
m
p
E
n
also indicates more self-similarity in data set or less noise.
Generally we take the value of
m
to be
2
and the value of
r
to be
0.2
×
s
t
d
. Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to
0.2
×
s
t
d
for a very large population.
The definition mentioned above is a special case of multi scale sampEn with
δ
=
1
,where
δ
is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its each element specified by the value of
δ
. And modified template vector is defined as
X
m
,
δ
(
i
)
=
x
i
,
x
i
+
δ
,
x
i
+
2
×
δ
,
.
.
.
,
x
i
+
(
m
−
1
)
×
δ
and sampEn can be written as
S
a
m
p
E
n
(
m
,
r
,
δ
)
=
−
log
A
δ
B
δ
And we calculate
A
δ
and
B
δ
like before.
Sample entropy can be implemented easily in many different programming languages. An example written in Matlab can be found here. An example written for R can be found here.