![]() | ||
Loss of significance is an undesirable effect in calculations using finite-precision arithmetic. It occurs when an operation on two numbers increases relative error substantially more than it increases absolute error, for example in subtracting two nearly equal numbers (known as catastrophic cancellation). The effect is that the number of significant digits in the result is reduced unacceptably. Ways to avoid this effect are studied in numerical analysis.
Contents
Demonstration of the problem
The effect can be demonstrated with decimal numbers. The following example demonstrates loss of significance for a decimal floating-point data type with 10 significant digits:
Consider the decimal number
0.1234567891234567890A floating-point representation of this number on a machine that keeps 10 floating-point digits would be
0.1234567891which is fairly close when measuring the error as a percentage of the value. It is very different when measured in order of precision. The first is accurate to 6981099999999999999♠10×10−20, while the second is only accurate to 6991100000000000000♠10×10−10.
Now perform the calculation
0.1234567891234567890 − 0.1234567890000000000The answer, accurate to 10 significant digits, is
0.0000000001234567890However, on the 10-digit floating-point machine, the calculation yields
0.1234567891 − 0.1234567890 = 0.0000000001In both cases the result is accurate to same order of magnitude as the inputs (-20 and -10, respectively). In the second case, the answer seems to have one significant digit, which would amount to loss of significance. However, in computer floating point arithmetic, all operations can be viewed as being performed on antilogarithms, for which the rules for significant figures indicate that the number of significant figures remains the same as the smallest number of significant figures in the mantissas. The way to indicate this and represent the answer to 10 significant figures is:
6990100000000000000♠1.000000000×10−10Workarounds
It is possible to do computations using an exact fractional representation of rational numbers and keep all significant digits, but this is often prohibitively slower than floating-point arithmetic. Furthermore, it usually only postpones the problem: What if the data is accurate to only ten digits? The same effect will occur.
One of the most important parts of numerical analysis is to avoid or minimize loss of significance in calculations. If the underlying problem is well-posed, there should be a stable algorithm for solving it.
Loss of significant bits
Let x and y be positive normalized floating point numbers.
In the subtraction x − y, r significant bits are lost where
for some positive integers p and q.
Instability of the quadratic equation
For example, consider the quadratic equation:
with the two exact solutions:
This formula may not always produce an accurate result. For example, when
The case
We have
In real arithmetic, the roots are
In 10-digit floating-point arithmetic,
Notice that the solution of greater magnitude is accurate to ten digits, but the first nonzero digit of the solution of lesser magnitude is wrong.
Because of the subtraction that occurs in the quadratic equation, it does not constitute a stable algorithm to calculate the two roots.
A better algorithm
A careful floating point computer implementation combines several strategies to produce a robust result. Assuming the discriminant, b2 − 4ac, is positive and b is nonzero, the computation would be as follows:
Here sgn denotes the sign function, where
To illustrate the instability of the standard quadratic formula versus this variant formula, consider a quadratic equation with roots
Using the standard quadratic formula and maintaining sixteen significant figures at each step, the standard quadratic formula yields
Note how cancellation has resulted in
Note the retention of all significant digits for
Note that while the above formulation avoids catastrophic cancellation between
To illustrate this, consider the following quadratic equation, adapted from Kahan (2004):
This equation has
However, when computed using IEEE 754 double-precision arithmetic corresponding to 15 to 17 significant digits of accuracy,
which are both false after the eighth significant digit. This is despite the fact that superficially, the problem seems to require only eleven significant digits of accuracy for its solution.