Rahul Sharma (Editor)

Centering matrix

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.

Contents

Definition

The centering matrix of size n is defined as the n-by-n matrix

C n = I n 1 n O

where I n is the identity matrix of size n and O is an n-by-n matrix of all 1's. This can also be written as:

C n = I n 1 n 1 1

where 1 is the column-vector of n ones and where denotes matrix transpose.

For example

C 1 = [ 0 ] , C 2 = [ 1 0 0 1 ] 1 2 [ 1 1 1 1 ] = [ 1 2 1 2 1 2 1 2 ] , C 3 = [ 1 0 0 0 1 0 0 0 1 ] 1 3 [ 1 1 1 1 1 1 1 1 1 ] = [ 2 3 1 3 1 3 1 3 2 3 1 3 1 3 1 3 2 3 ]

Properties

Given a column-vector, v of size n, the centering property of C n can be expressed as

C n v = v ( 1 n 1 v ) 1

where 1 n 1 v is the mean of the components of v .

C n is symmetric positive semi-definite.

C n is idempotent, so that C n k = C n , for k = 1 , 2 , . Once the mean has been removed, it is zero and removing it again has no effect.

C n is singular. The effects of applying the transformation C n v cannot be reversed.

C n has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

C n has a nullspace of dimension 1, along the vector 1 .

C n is a projection matrix. That is, C n v is a projection of v onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace 1 . (This is the subspace of all n-vectors whose components sum to zero.)

Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix X , the multiplication C m X removes the means from each of the n columns, while X C n removes the means from each of the m rows.

The centering matrix provides in particular a succinct way to express the scatter matrix, S = ( X μ 1 ) ( X μ 1 ) of a data sample X , where μ = 1 n X 1 is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

S = X C n ( X C n ) = X C n C n X = X C n X .

C n is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are k = n , and p 1 = p 2 = = p n = 1 n .

References

Centering matrix Wikipedia