Log probability - Alchetron, The Free Social Encyclopedia

In computer science, the use of log probabilities means representing probabilities in logarithmic space, instead of the standard [ 0 , 1 ] interval. This has practical advantages, because of the way in which computers approximate real numbers, and because computers can historically perform addition more efficiently than multiplication.

A log probability is simply the logarithm of a probability. The logarithm function is not defined for zero, so log probabilities can only represent non-zero probabilities. Since the logarithm of a number in [ 0 , 1 ) interval is negative, often the negative log probabilities are used. In that case the log probabilities in the following formulas would be inverted. Any base can be selected for the logarithm.

x ′ = log ⁡ ( x ) ∈ R y ′ = log ⁡ ( y ) ∈ R

The product of probabilities x ⋅ y corresponds to addition in logarithmic space.

log ⁡ ( x ⋅ y ) = log ⁡ ( x ) + log ⁡ ( y ) = x ′ + y ′ .

The sum of probabilities x + y is a bit more involved to compute in logarithmic space, requiring the computation of one exponent and one logarithm.

However, in many applications a multiplication of probabilities (giving the probability of all independent events occurring) is used more often than their addition (giving the probability of at least one of them occurring). Additionally, the cost of computing the addition can be avoided in some situations by simply using the highest probability as an approximation. Since probabilities are non-negative this gives a lower bound. This approximation is used in reverse to get a continuous approximation of the max function.

Representing probabilities in this way has two main advantages:

Speed. Since multiplication is more expensive than addition, taking the product of a high number of probabilities is faster if they are represented in log form. (The conversion to log form is expensive, but is only incurred once.)
Accuracy. The use of log probabilities improves numerical stability, when the probabilities are very small.

The use of log probabilities is widespread in several fields of computer science such as information theory and natural language processing as it represents the surprisal, the minimum length of the message that specifies the outcome in an optimally efficient code.

Addition in log space

log ⁡ ( x + y ) = log ⁡ ( x + x ⋅ y / x ) = log ⁡ ( x + x ⋅ exp ⁡ ( log ⁡ ( y / x ) ) ) = log ⁡ ( x ⋅ ( 1 + exp ⁡ ( log ⁡ ( y ) − log ⁡ ( x ) ) ) ) = log ⁡ ( x ) + log ⁡ ( 1 + exp ⁡ ( log ⁡ ( y ) − log ⁡ ( x ) ) ) = x ′ + log ⁡ ( 1 + exp ⁡ ( y ′ − x ′ ) )

The formula above is more accurate than log ⁡ ( e x ′ + e y ′ ) , provided one takes advantage of the asymmetry in the addition formula. x ′ should be the larger (least negative) of the two operands. This also produces the correct behavior if one of the operands is floating-point negative infinity, which corresponds to a probability of zero.

− ∞ + log ⁡ ( 1 + exp ⁡ ( y ′ − ( − ∞ ) ) ) = − ∞ + ∞ This quantity is indeterminate, and will result in NaN. x ′ + log ⁡ ( 1 + exp ⁡ ( − ∞ − x ′ ) ) = x ′ + 0 This is the desired answer.

Note that the above formula alone will incorrectly produce an indeterminate result in the case where both arguments are − ∞ . This should be checked for separately to return − ∞ .

Note also that for numerical reasons, one should use a function that computes log ⁡ ( 1 + x ) (log1p) directly.

References

Log probability Wikipedia

(Text) CC BY-SA