Simpson's paradox

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

Simpson's paradox for continuous data: a positive trend appears for two separate groups (blue and red) and a negative trend (black, punched) when data are combined.

In probability and statistics, the Simpson's paradox or Yule-Simpson effect is a paradox in which a trend that appears in various sets of data disappears when these sets are combined and instead the opposite trend appears for the aggregated data. This situation occurs frequently in the social sciences, in Andre's experiments, and in medical statistics, and is a cause of confusion when the frequency of the data is unsupportedly assigned a causal interpretation. The paradox disappears when it is analyze the causal relationships present.

Although relatively unknown to most people, Simpson's paradox is well known to statisticians and is described in many introductory statistics books. Many statisticians believe that counterintuitive results should be reported to the public like Simpson's paradox.

The phenomenon was first described by Edward H. Simpson in a 1951 technical paper, but it had already been previously described by Karl Pearson, et al., in 1899, and by Udny Yule in 1903. The name Simpson's paradox was first used by Colin R. Blyth in 1972.

Since Edward Simpson did not actually discover this statistical paradox (it being a case of Stigler's law of eponymy), some writers prefer to use the impersonal terms reversal paradox and amalgamation paradox, or sometimes the Yule-Simpson effect.

Examples

Treatment of kidney stones

This is a real example taken from a medical study that compares the success ratio of two treatments for kidney stones.

The table below shows the success rates and the number of treatments involving large and small stones. Here Treatment A refers to open procedures and Treatment B to percutaneous nephrolithotomy:

	Treatment A	Treatment B
Small calculations	Group 1 93% (81/87)	Group 2 87% (234/270)
Large calculations	Group 3 73% (192/263)	Group 4 69% (55/80)
Global	78% (273/350)	83% (289/350)

The paradoxical conclusion is that treatment A is more effective when used on both small and large stones, while treatment B is more effective when both sizes are considered at the same time. In this case, the "hidden" (or confounding factor) stone size was not known to be important in advance before its effects were included.

The treatment considered best is determined by an inequality between two proportions (successes/total). The reversal of the inequality between the proportions creates Simpson's paradox, which occurs when the following two effects occur simultaneously.

The sizes of both groups, combined when the hidden variable is ignored, are very different. Doctors tend to give severe cases (big calculations) the best treatment (A) and the milder cases treatment B. Therefore, the totals are dominated by groups 3 and 2 and not by groups 1 and 4 that are much smaller.
The hidden variable has a greater effect on proportions, since the percentage of success has a greater impact on the severity of the case before the choice of treatment. Therefore, the group of patients with large calculations using treatment A (group 3) has a result lower than the group with minor calculations even if the latter group used lower treatment B (group 2).

Medical treatment using extreme numbers

This fictitious example follows the theme of the previous case, but with exaggeratedly dichotomized numbers in order to facilitate the understanding of the phenomenon.

The following table shows for this fictitious case, the same as in the real example, the percentages of success and the number of treatments that involve the problem type "1" and to the problem type "2":

	Treatment A	Treatment B
"1" problem	Group 1 100 per cent (1/1)	Group 2 98.9% (98/99)
"2" problem.	Group 3 1 per cent (1/99)	Group 4 0% (0/1)
Both	2% (2/100)	98% (98/100)

In this case it is clear that the study does not have validity due to the extreme of the samples, but the underlying paradox remains: treatment A is better in both types of problem, but treatment B is better in the group. It also becomes more evident where the risk is, having been exploited in the example: the fact that the statistical samples are so dichotomous between types of problem, causes the apparent contradiction.

Gender Discrimination in Berkeley

One of the best-known examples of Simpson's paradox occurred when a lawsuit was filed against the University of California, Berkeley for discrimination against women who had applied to graduate school. The admissions results for the summer of 1973 showed that male applicants were more likely than female applicants to be selected and that the difference was such that it could not possibly be due to chance.

	Applications	Admissions
Men	8442	44%
Women	4321	35%

However, when examining departments individually, it was found that no department was biased against women. In fact, most departments had a "small but statistically significant bias in favor of women" The data for the six largest departments are listed below.

Department	Men		Women
Department	Applications	Admissions	Applications	Admissions
A	825	62%	108	82%
B	560	63%	25	68%
C	325	37%	593	34%
D	417	33%	375	35%
E	191	28%	393	24 per cent
F	272	6%	341	7%

The research article by Bickel, et al. concluded that women tended to apply in competitive fields with low admissions rates (such as the English language department) while men tended to apply in departments with less competition and higher percentage of admissions (such as engineering and chemistry). The conditions under which admission frequency data from specific departments constituted a defense against charges of discrimination are set forth in the book Causality by Pearl.

Contenido relacionado

Más resultados...