In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.
The centering matrix of size n is defined as the n-by-n matrix
C
n
=
I
n
−
1
n
O
where
I
n
is the identity matrix of size n and
O
is an n-by-n matrix of all 1's. This can also be written as:
C
n
=
I
n
−
1
n
1
1
⊤
where
1
is the column-vector of n ones and where
⊤
denotes matrix transpose.
For example
C
1
=
[
0
]
,
C
2
=
[
1
0
0
1
]
−
1
2
[
1
1
1
1
]
=
[
1
2
−
1
2
−
1
2
1
2
]
,
C
3
=
[
1
0
0
0
1
0
0
0
1
]
−
1
3
[
1
1
1
1
1
1
1
1
1
]
=
[
2
3
−
1
3
−
1
3
−
1
3
2
3
−
1
3
−
1
3
−
1
3
2
3
]
Given a column-vector,
v
of size n, the centering property of
C
n
can be expressed as
C
n
v
=
v
−
(
1
n
1
′
v
)
1
where
1
n
1
′
v
is the mean of the components of
v
.
C
n
is symmetric positive semi-definite.
C
n
is idempotent, so that
C
n
k
=
C
n
, for
k
=
1
,
2
,
…
. Once the mean has been removed, it is zero and removing it again has no effect.
C
n
is singular. The effects of applying the transformation
C
n
v
cannot be reversed.
C
n
has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.
C
n
has a nullspace of dimension 1, along the vector
1
.
C
n
is a projection matrix. That is,
C
n
v
is a projection of
v
onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace
1
. (This is the subspace of all n-vectors whose components sum to zero.)
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix
X
, the multiplication
C
m
X
removes the means from each of the n columns, while
X
C
n
removes the means from each of the m rows.
The centering matrix provides in particular a succinct way to express the scatter matrix,
S
=
(
X
−
μ
1
′
)
(
X
−
μ
1
′
)
′
of a data sample
X
, where
μ
=
1
n
X
1
is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
S
=
X
C
n
(
X
C
n
)
′
=
X
C
n
C
n
X
′
=
X
C
n
X
′
.
C
n
is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are
k
=
n
, and
p
1
=
p
2
=
⋯
=
p
n
=
1
n
.