Cross tabs (or cross tabulations) are the joint distribution of two or more variables. They are usually presented in a matrix, called a contingency table. Whereas a frequency distribution table describes the distribution of one variable, a contingency table describes the distribution of two or more variables simultaneously. It merges two or more frequency distribution tables into one. Each cell gives the number of respondents that gave that combination of responses, that is, each cell contains a single cross tabulation.

The following is an example of a 2 X 3 contingency table. The variable “Wikipedia usuage” has three categories: heavy user, light user, and non user. These categories are all inclusive so the columns sum to 100%. The other variable “intelligence” has two categories: smart, and air head. These categories are not all inclusive so the rows need not sum to 100%. Each cell gives the percentage of subjects that share that combination of traits.

smartair head
heavy Wiki user 70%5%
light Wiki user25%35%
non Wiki user5%60%

Cross tabs are frequently used because:

  1. They are easy to understand. They appeal to people that do not understand the more sophisticated measures.
  2. They can be used with any level of data: nominal, ordinal, interval, or ratio - cross tabs treat all data as if it is nominal
  3. A table can provide greater insight than single statistics
  4. It solves the problem of empty or sparse cells

The statistics associated with cross tabs are:

  • Chi-squared - This tests the statistical significance of the cross tabulations. Chi-squared should not be calculated for percentages. The cross tabs must be converted back to absolute counts (numbers) before calculating chi-squared. Chi-squared is also problematic when any cell has a joint frequency of less than five.
  • Contingency Coefficient - This tests the strength of association of the cross tabulations. It is a variant of the phi coefficient that adjusts for statistical significance. Values range from 0 (no association) to 1 (the theoretical maximum possible association).
  • Cramer’s V - This tests the strength of association of the cross tabulations. It is a variant of the phi coefficient that adjusts for the number of rows and columns. Values range from 0 (no association) to 1 (the theoretical maximum possible association).
  • Lambda Coefficient - This tests the strength of association of the cross tabulations when the variables are measured at the nominal level. Values range from 0 (no association) to 1 (the theoretical maximum possible association). Asymmetric lambda measures the percentage improvement in predicting the dependent variable. Symmetric lambda measures the percentage improvement when prediction is done in both directions.
  • Tau b - This tests the strength of association of the cross tabulations when both variables are measured at the ordinal level. It makes adjustments for ties and is most suitable for square tables. Values range from -1 (no association) to +1 (the theoretical maximum possible association).
  • Tau c - This tests the strength of association of the cross tabulations when both variables are measured at the ordinal level. It makes adjustments for ties and is most suitable for rectangular tables. Values range from -1 (no association) to +1 (the theoretical maximum possible association).
  • Gamma - This tests the strength of association of the cross tabulations when both variables are measured at the ordinal level. It makes no adjustment for either table size or ties. Values range from -1 (no association) to +1 (the theoretical maximum possible association).

See also : marketing, marketing research, quantitative marketing research