Pearson's chi-square test (χ2) is one of a variety of chi-square tests – statistical procedures whose results are evaluated by reference to the chi-square distribution. It tests a null hypothesis that the relative frequencies of occurrence of observed events follow a specified frequency distribution. The events must be mutually exclusive. One of the simplest examples is the hypothesis that an ordinary six-sided die is "fair", i.e., all six outcomes occur equally often. Chi-square is calculated by finding the difference between each observed and theoretical frequency, squaring them, dividing each by the theoretical frequency, and taking the sum of the results:
- O = an observed frequency
- E = an expected (theoretical) frequency, asserted by the null hypothesis
Pearson's chi-square is used to assess two types of comparison: tests of goodness of fit and tests of independence. A test of goodness of fit establishes whether or not an observed frequency distribution differs from a theoretical distribution. A test of independence assesses whether paired observations on two variables are independent of each other – for example, whether people from different regions differ in the frequecy with which they report that they support a political candidate.
Pearson's chi-square is the original and most widely-used chi-square test.
The null distribution of the Pearson statistic is only approximated as a chi-square distribution. This approximation arises as the true distribution, under the null hypothesis, of the expected value is given by a Binomial distribution:
- p = probability, under the null hypothesis
- n = number of samples
When comparing the Pearson test statistic against a chi-squared distribution, the above binomial distribution is approximated as a Gaussian (normal) distribution: