Modified Richardson iteration is an iterative method for solving a system of linear equations. Richardson iteration was proposed by Lewis Richardson in his work dated 1910. It is similar to the Jacobi and Gauss–Seidel method.
We seek the solution to a set of linear equations, expressed in matrix terms as
A x = b . The Richardson iteration is
x ( k + 1 ) = x ( k ) + ω ( b − A x ( k ) ) , where ω is a scalar parameter that has to be chosen such that the sequence x ( k ) converges.
It is easy to see that the method has the correct fixed points, because if it converges, then x ( k + 1 ) ≈ x ( k ) and x ( k ) has to approximate a solution of A x = b .
Subtracting the exact solution x , and introducing the notation for the error e ( k ) = x ( k ) − x , we get the equality for the errors
e ( k + 1 ) = e ( k ) − ω A e ( k ) = ( I − ω A ) e ( k ) . Thus,
∥ e ( k + 1 ) ∥ = ∥ ( I − ω A ) e ( k ) ∥ ≤ ∥ I − ω A ∥ ∥ e ( k ) ∥ , for any vector norm and the corresponding induced matrix norm. Thus, if ∥ I − ω A ∥ < 1 , the method converges.
Suppose that A is diagonalizable and that ( λ j , v j ) are the eigenvalues and eigenvectors of A . The error converges to 0 if | 1 − ω λ j | < 1 for all eigenvalues λ j . If, e.g., all eigenvalues are positive, this can be guaranteed if ω is chosen such that 0 < ω < 2 / λ m a x ( A ) . The optimal choice, minimizing all | 1 − ω λ j | , is ω = 2 / ( λ m i n ( A ) + λ m a x ( A ) ) , which gives the simplest Chebyshev iteration.
If there are both positive and negative eigenvalues, the method will diverge for any ω if the initial error e ( 0 ) has nonzero components in the corresponding eigenvectors.
Consider minimizing the function F ( x ) = 1 2 ∥ A ~ x − b ∥ 2 2 . Since this is a convex function, a sufficient condition for optimality is that the gradient is zero ( ∇ F ( x ) = 0 ) which gives rise to the equation
A ~ T A ~ x = A ~ T b ~ . Define A = A ~ T A ~ and b = A ~ T b ~ . Because of the form of A, it is a positive semi-definite matrix, so it has no negative eigenvalues.
A step of gradient descent is
x ( k + 1 ) = x ( k ) − t ∇ F ( x ( k ) ) = x ( k ) − t ( A x ( k ) − b ) which is equivalent to the Richardson iteration by making t = ω .