Chou's invariance theorem

Updated on Sep 29, 2024

Edit

Comment

Chou's invariance theorem, named after Kuo-Chen Chou, is a result deployed in bioinformatics and cheminformatics related to multivariate statistics. Where a distance that would, in standard statistical theory, be defined as a Mahalanobis distance cannot be defined in this way because the relevant covariance matrix is singular, a replacement would be to reduce the dimension of the multivariate space until the relevant covariance matrix is invertible. This can be achievable by simply omitting one or more of the original coordinates until a space of full rank is reached. Chou's invariance theorem says that it does not matter which of the coordinates are selected for removal, as the same values of distance would be calculated as a final result.

Background

When using Mahalanobis distance or covariant discriminant to calculate the similarity of two proteins based on their amino acid compositions, to avoid the divergence problem due to the normalization condition imposed to their 20 constituent components, a dimension-reduced operation is needed by leaving out one of the 20 components and making the remaining 19 components completely independent. However, which one of the 20 components should be removed? Will the result be different by removing a different component? The same problems also occur when the calculation is based on (20 + λ)-D (dimensional) pseudo amino acid composition, where λ is an integer. Generally speaking, to calculate the Mahalanobis distance or covariant discriminant between two vectors each with Ω normalized components, the dimension-reduced operation is needed and hence the aforementioned problems are always to occur. To address these problems, the Chou's Invariance Theorem was developed in 1995.

Essence

According to the Chou’s invariance theorem, the outcome of the Mahalanobis distance or covariant discriminant will remain the same regardless of which one of the components is left out. Accordingly, any one of the constituent normalized components can be left out to overcome the divergence problem without changing the final result for Mahalanobis distance or covariant discriminant.

Proof

The rigorous mathematical proof for the theorem was given in the appendix of a paper by Chou.

Applications

The theorem has been used in predicting protein subcellular localization, identifying apoptosis protein subcellular location, predicting protein structural classification, as well as identifying various other important attributes for proteins.

References

Chou's invariance theorem Wikipedia

(Text) CC BY-SA

Contents

Background

Essence

Proof

Applications

References