Suvarna Garge (Editor)

Scatter matrix

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix, for instance of the multivariate normal distribution.

Contents

Definition

Given n samples of m-dimensional data, represented as the m-by-n matrix, X = [ x 1 , x 2 , , x n ] , the sample mean is

x ¯ = 1 n j = 1 n x j

where x j is the jth column of X .

The scatter matrix is the m-by-m positive semi-definite matrix

S = j = 1 n ( x j x ¯ ) ( x j x ¯ ) T = j = 1 n ( x j x ¯ ) ( x j x ¯ ) = ( j = 1 n x j x j T ) n x ¯ x ¯ T

where T denotes matrix transpose, and multiplication is with regards to the outer product. The scatter matrix may be expressed more succinctly as

S = X C n X T

where C n is the n-by-n centering matrix.

Application

The maximum likelihood estimate, given n samples, for the covariance matrix of a multivariate normal distribution can be expressed as the normalized scatter matrix

C M L = 1 n S .

When the columns of X are independently sampled from a multivariate normal distribution, then S has a Wishart distribution.

References

Scatter matrix Wikipedia