Simpson's paradox

Simpson's paradox is a statistical paradox described by E. H. Simpson in 1951: an association in sub-populations may be reversed in the population. It appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.

As an example, suppose we have two people, Ann and Bob. In the first test, Ann and Bob are let loose on Wikipedia, and Ann edits 100 articles, improving 60 of them, while Bob edits 10 articles, improving 9 of them. In the second test, Ann and Bob are again let loose on Wikipedia, and this time Ann edits 10 articles, improving 1 of them, while Bob edits 100 articles, improving 30 of them.

Now we can summarise, introduce some notation (that will be useful later) and generate the paradox

In the first test, Ann improved 60% of the articles she edited (S_A(1) = 60%), while Bob's success rate was 90% (= S_B(1)) Success is associated with Bob.
In the second test Ann managed 10% (S_A(2)) while Bob achieved 30% (S_B(2)). On both occasions Bob's edits were more successful than Ann's. Success is again associated with Bob.
But if we combine the two tests, we see that Ann and Bob both edited 110 articles, and that Ann improved 61 (S_A = 61/110) while Bob improved only 39 (S_B = 39/110).
S_B < S_A. Success is now associated with Ann. Bob is better on every test but worse overall!

The arithmetical basis of the paradox is uncontroversial. If S_B(1) > S_A(1) and S_B(2) > S_A(2) we feel that S_B must be greater than S_A. However if different weights are used to form the overall score for each person then this feeling may be disappointed. Here the first test is weighted 100/110 for Ann and 10/110 for Bob while the weights are reversed on the second test.

S_A = 100/110S_A(1) + 10/110S_A(2).

S_B = 10/110S_B(1) + 100/110S_B(2).

By more extreme reweighting A's overall score can be pushed up to 60% and B's down to 30%.

The arithmetic allows us to see through the paradox but there is still the conflict between the individual performances and the overall performance: who is better, A or B? Ann and Bob's creator thought Ann was better--her overall success rate is higher. But it is possible to retell the story so that it appears obvious that B is better. A and B are now hospitals and the two tests have become two types of patient: mild and severe. The numerical data is as before: B is better at curing both types of patient but its overall success rate is worse because almost all (100/110) of its patients are severe cases while almost all of A's are mild (100/110). The association of success with A is misleading, even spurious.

In this retelling has something been added, or has a tacit assumption of the Ann and Bob story been changed? These issues are discussed in the modern literature on Simpson's paradox. Although statisticians have known about the Simpson's paradox phenomenon for over a century, there has lately been a revival of interest in it and philosophers, computer scientists, epidemiologists, economists and others have discussed it too.

External links

For a brief history of the origins of the paradox see the entries on Simpson's Paradox and Spurious Correlation in

Earliest known uses of some of the words of mathematics: S

For a recent technical discussion with many references see

Simpson's Paradox: An Anatomy by Judea Pearl