Cramér's V - Alchetron, The Free Social Encyclopedia

In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ_c) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.

Usage and interpretation

φ_c is the intercorrelation of two discrete variables and may be used with variables having two or more levels. φ_c is a symmetrical measure, it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns doesn't matter, so φ_c may be used with nominal data types or higher (ordered, numerical, etc.)

Cramér's V may also be applied to goodness of fit chi-squared models when there is a 1×k table (e.g.: r=1). In this case k is taken as the number of optional outcomes and it functions as a measure of tendency towards a single outcome.

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when the two variables are equal to each other.

φ_c² is the mean square canonical correlation between the variables.

In the case of a 2×2 contingency table Cramér's V is equal to the Phi coefficient.

Note that as chi-squared values tend to increase with the number of cells, the greater the difference between r (rows) and c (columns), the more likely φ_c will tend to 1 without strong evidence of a meaningful correlation.

V may be viewed as the association between two variables as a percentage of their maximum possible variation. V² is the mean square canonical correlation between the variables.

Calculation

Let a sample of size n of the simultaneously distributed variables A and B for i = 1 , … , r ; j = 1 , … , k be given by the frequencies

n i j = number of times the values ( A i , B j ) were observed.

The chi-squared statistic then is:

χ 2 = ∑ i , j ( n i j − n i . n . j n ) 2 n i . n . j n

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

V = φ 2 min ( k − 1 , r − 1 ) = χ 2 / n min ( k − 1 , r − 1 )

where:

φ 2 is the phi coefficient.

χ 2 is derived from Pearson's chi-squared test

n is the grand total of observations and

k being the number of columns.

r being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.

The formula for the variance of V=φ_c is known.

In R, the function cramersV() from the lsr package, calculates V using the chisq.test function from the stats package.

Bias correction

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by

V ~ = φ ~ 2 min ( k ~ − 1 , r ~ − 1 )

where

φ ~ 2 = max ( 0 , φ 2 − ( k − 1 ) ( r − 1 ) n − 1 )

and

k ~ = k − ( k − 1 ) 2 n − 1 r ~ = r − ( r − 1 ) 2 n − 1

Then V ~ estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, E φ 2 = ( k − 1 ) ( r − 1 ) n − 1 .

References

Cramér's V Wikipedia

(Text) CC BY-SA

Contents

Usage and interpretation

Calculation

Bias correction

References