"I abhor averages. I like theindividual case. A man may have six meals one day and none the next,making an average of three meals per day, but that is not a good wayto live."

Louis D. Brandies, A Free Man's Life


OVERVIEW: This section deals withtests and confidence intervals for sample proportions. When thesample size is large, the distribution of p(hat) is approximatelynormal with mean p and standard deviation sqrt(p(1-p)/n). Theinference procedures are reasonably accurate if the population is atleast ten times larger than the sample, and if the sample size issuch that np and n(1-p) are both at least 10.

A confidence intervalfor a sample proportion p(hat) has theform

p(hat) plus/minusz*sqrt[p(hat)((1-p(hat))/n).

The expression sqrt[p(hat)((1-p(hat))/n) = SE is called thestandard errorof p(hat)

Tests of a null hypothesis: p = p0 are based on thestatistic

z = (p(hat) - p0)/sqrt((p0(1-p0)/n))


A random sample of 867 registered voters foundthat 63% favored Proposition A. Calculate a 95% confidence intervalfor the proportion of voters in the population who favor PropositionA.

Response: The desired confidenceinterval is

.63 plus/minus 1.96sqrt(.63(1-.63)/867) = .63plus/minus .0321 = [.5979,.6621].

We see this type of thing (indirectly) quitefrequently in newspapers and magazines when the report results ofsurveys. For instance, a survey might report that 56% of a sample of1,446 voters support candidate Herkimer, with a margin of error = 3%.Those who are statistically literate know that the 95% confidenceinterval is [53%, 59%]. The margin of error reported is 2 standarddeviations. In this example, one standard deviation issqrt(.56(.44)/1446) = .013, and 2(.013) = .026, which is rounded upto 3%.


A random sample of 1700 voters from a largepopulation found found that 1250 favored Proposition B. Test thehypothesis that that at most 70% of the voters in the populationfavor Proposition B.


Null hypothesis H0: p = .7

Alt. hypothesis Ha: p > .7

Sample proportion p(hat) = 1250/1700 =.7353

z.7353 = (.7353-.7)/sqrt(.7(.3)/1700) = 3.18

P-value = normalcdf(3.18,1E99,0,1) = .0007= .07%

Conclusion: It is highly unlikely that thepopulation proportion favoring Proposition B is 70% or less.

Another approach:

The standard error, SE =sqrt(.7(.3)/1700) = .011114

P-value = normalcdf(.7353,1E99,.70,.011114) = .0007, which agrees with thecalculations above.

There is a subsection of this section calledChoosing the Sample Size, which begins on page 669. This section is well-writtenand easy to follow. I'm going to attempt to put a real-lifeperspective on this. Here's a hypothetical (but realistic)situation.

Suppose you are heading a scientificpolling organization. Candidate Herkimer wants you poll a group ofvoters to determine his popularity. And, he would like a margin oferror of 2%. The question now becomes how large a sample size isneeded to get a 2% margin of error?

Sounds simple enough. We are interested in a 95%confidence interval, which involves the z-value 1.96. For simplicity,let's just use 2 instead of 1.96. We need only solve theequation

.02 = 2sqrt((p(1-p)/n)

Easily done: We get (.02)2 = 4p(1-p)/N ==> N =4p(1-p)/(.02)2. But, we have a Catch 22 situation. We don't have a value of p,and we can't get an approximation for p before we take a sample. But,it is the sample size we are trying to determine, etc., etc.

Note that the equation N =4p(1-p)/(.02)2 represents a parabola that opens downward. If you graphthis downward-opening parabola, you will find that N obtains amaximum value when p = 0.5. Substituting p = 0.5 into the equation,we get N = 4(.5)(1-.5)/(.02)2 = = 1/(.02)2 = 2,500. Hence, to get themargin of error requested by candidate Herkimer, we would need an SRSof at least 2,500 voters.

In general, if we are interested in a specificmargin of error = me for a 95% confidence interval, the sample size needed is1/(me)2.

Let's retract: Suppose candidate Herkimer wants amargin of error of 1.5% or less. Basically, he wants a 95% confidenceinterval that is 3 units wide. The sample size needed is1/(015)2 =4445. Let's now assume that a sample of this size shows that 43% ofthe voting population will vote for Herkimer. This translates to a95% confidence interval [41.5%, 44.5%]. Putting the properinterpretation on this interval, and assuming that Herkimer isinvolved in a 2-person political race, the polls indicate that heneeds to do something to attract voters. Why? Simply because thestatistics suggest he would loose the election if it were heldtoday.