In information theory, the binary entropy function, denoted
H
(
p
)
or
H
b
(
p
)
, is defined as the entropy of a Bernoulli process with probability of success
p
. Mathematically, the Bernoulli trial is modelled as a random variable
X
that can take on only two values: 0 and 1. The event
X
=
1
is considered a success and the event
X
=
0
is considered a failure. (These two events are mutually exclusive and exhaustive.)
If
Pr
(
X
=
1
)
=
p
, then
Pr
(
X
=
0
)
=
1
−
p
and the entropy of
X
(in shannons) is given by
H
(
X
)
=
H
b
(
p
)
=
−
p
log
2
p
−
(
1
−
p
)
log
2
(
1
−
p
)
,
where
0
log
2
0
is taken to be 0. The logarithms in this formula are usually taken (as shown in the graph) to the base 2. See binary logarithm.
When
p
=
1
2
, the binary entropy function attains its maximum value. This is the case of the unbiased bit, the most common unit of information entropy.
H
(
p
)
is distinguished from the entropy function
H
(
X
)
in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variables as a parameter. Sometimes the binary entropy function is also written as
H
2
(
p
)
. However, it is different from and should not be confused with the Rényi entropy, which is denoted as
H
2
(
X
)
.
In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose
p
=
0
. At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If
p
=
1
, the result is again certain, so the entropy is 0 here as well. When
p
=
1
/
2
, the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if
p
=
1
/
4
, there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit.
The derivative of the binary entropy function may be expressed as the negative of the logit function:
d
d
p
H
b
(
p
)
=
−
logit
2
(
p
)
=
−
log
2
(
p
1
−
p
)
.
The Taylor series of the binary entropy function in a neighborhood of 1/2 is
H
b
(
p
)
=
1
−
1
2
ln
2
∑
n
=
1
∞
(
1
−
2
p
)
2
n
n
(
2
n
−
1
)
for
0
≤
p
≤
1
.