In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.
The centering matrix of size n is defined as the n-by-n matrix
C n = I n − 1 n O where I n is the identity matrix of size n and O is an n-by-n matrix of all 1's. This can also be written as:
C n = I n − 1 n 1 1 ⊤ where 1 is the column-vector of n ones and where ⊤ denotes matrix transpose.
For example
C 1 = [ 0 ] ,
C 2 = [ 1 0 0 1 ] − 1 2 [ 1 1 1 1 ] = [ 1 2 − 1 2 − 1 2 1 2 ] ,
C 3 = [ 1 0 0 0 1 0 0 0 1 ] − 1 3 [ 1 1 1 1 1 1 1 1 1 ] = [ 2 3 − 1 3 − 1 3 − 1 3 2 3 − 1 3 − 1 3 − 1 3 2 3 ] Given a column-vector, v of size n, the centering property of C n can be expressed as
C n v = v − ( 1 n 1 ′ v ) 1 where 1 n 1 ′ v is the mean of the components of v .
C n is symmetric positive semi-definite.
C n is idempotent, so that C n k = C n , for k = 1 , 2 , … . Once the mean has been removed, it is zero and removing it again has no effect.
C n is singular. The effects of applying the transformation C n v cannot be reversed.
C n has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.
C n has a nullspace of dimension 1, along the vector 1 .
C n is a projection matrix. That is, C n v is a projection of v onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace 1 . (This is the subspace of all n-vectors whose components sum to zero.)
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix X , the multiplication C m X removes the means from each of the n columns, while X C n removes the means from each of the m rows.
The centering matrix provides in particular a succinct way to express the scatter matrix, S = ( X − μ 1 ′ ) ( X − μ 1 ′ ) ′ of a data sample X , where μ = 1 n X 1 is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
S = X C n ( X C n ) ′ = X C n C n X ′ = X C n X ′ . C n is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are k = n , and p 1 = p 2 = ⋯ = p n = 1 n .