OVERVIEW: We can see relations betweentwo or more categorical variables by setting up tables. Up to thispoint, we have studied relations in which at least the responsevariable was quantitative.
A two-waytable of counts describes the relationshipbetween two categorical variables... the row variable and the columnvariable. The row totals and column totals give the marginaldistributions of the two variables separately, but do not give anyinformation about the relationships between the variables.Probabilities, including conditional probabilities, can be calculatedfrom two-way tables.
Simple example:
Notation:
...Prob(X) is the probability that X is true.
...Prob(X|Y) is the probability that X is true, given that Y istrue.Two hundred employees of a company are classifiedaccording to the following 2-by-3 table, where A, B, and C aremutually exclusive properties.
Have A
Have B
Have C
ROW TOTALS
FEMALE 20 40 60 120 MALE 30 10 40 80 COLUMN TOTALS
50 50 100 200 o What is the probability that a randomly chosenperson is female?
Ans. Prob(F) = 120/200 = 60%.
o What is the probability that a randomly chosenperson has property A?
Ans. Prob(A) = 50/200 = 25%.
o If a randomly chosen person is female, what isthe probability that she has property B?
Ans. Prob(B|F) = 40/120 = 33 1/3% [=prob(B and F)/prob(F).]
o If a randomly chosen person has property C, whatis the probability that the individual is a male?
Ans. Prob(M|C) = 40/100 = 40% [=prob(C and M)/prob(C).]
o If a randomly chosen person has B or C, what isthe probability that the person is a male?
Ans. Prob(M|B or C) = 50/150 = 331/3%.
===================================
An example of Simpson'sparadox:
Here are the batting averages of two baseballplayers for both halves of a season.
[Batting average is simply the ratio of
FIRST HALF-SEASON
SECOND HALF-SEASON
Hits
Times at bat
Batting average
Hits
Times at bat
Batting average
Caldwell
60 200 .300
50 200 .250
Wilson
29 100 .290
1 5 .200
Here are the batting averages for the entireseason.
Caldwell: 110/400 =.275
Wilson: 30/105 =
.286
Caldwell, despite having a better average thanWilson for both halves of the season, ends up with an overall averagethat is less than that of Wilson. Using percentages, one canconstruct numerous examples of Simpson's paradox.
From an algebraicstandpoint:
If a/b > c/d and p/q > r/s, then
...it is true that a/b + p/q > c/d + r/s.
...it is not necessarily true that (a+p)/(b+q) >(c+r)/(d+s).
RETURN TO TEXTBOOK HOME PAGE /Back to the top of this page