"I abhor averages. I like theindividual case. A man may have six meals one day and none the next,making an average of three meals per day, but that is not a good wayto live."

12.1 INFERENCE FOR A POPULATION PROPORTION(Pages 658 - 674)OVERVIEW: This section deals withtests and confidence intervals for sample proportions. When thesample size is large, the distribution of p(hat) is approximatelynormal with mean p and standard deviation sqrt(p(1-p)/n). Theinference procedures are reasonably accurate if the population is atleast ten times larger than the sample, and if the sample size issuch that np and n(1-p) are both at least 10.

A

confidence intervalfor a sample proportion p(hat)has theformp(hat) plus/minusz*sqrt[p(hat)(

(1-p(hat))/ n). The expression sqrt[p(hat)((1-p(hat))

/n) = SE is called the standard errorof p(hat)Tests of a null hypothesis: p = p

_{0}are based on thestatisticz = (p(hat) - p

_{0})/sqrt((p_{0}(1-p_{0})/n)) Example:

A random sample of 867 registered voters foundthat 63% favored Proposition A. Calculate a 95% confidence intervalfor the proportion of voters in the population who favor PropositionA.

Response: The desired confidenceinterval is

.63 plus/minus 1.96sqrt(.63(1-.63)/867) = .63plus/minus .0321 = [.5979,.6621].

We see this type of thing (indirectly) quitefrequently in newspapers and magazines when the report results ofsurveys. For instance, a survey might report that 56% of a sample of1,446 voters support candidate Herkimer, with a margin of error = 3%.Those who are statistically literate know that the 95% confidenceinterval is [53%, 59%]. The margin of error reported is 2 standarddeviations. In this example, one standard deviation issqrt(.56(.44)/1446) = .013, and 2(.013) = .026, which is rounded upto 3%.

Example:

A random sample of 1700 voters from a largepopulation found found that 1250 favored Proposition B. Test thehypothesis that that at most 70% of the voters in the populationfavor Proposition B.

Analysis:

Null hypothesis H

_{0}: p = .7Alt. hypothesis H

_{a}: p > .7Sample proportion p(hat) = 1250/1700 =.7353

z

_{.7353}= (.7353-.7)/sqrt(.7(.3)/1700) = 3.18 P-value =

3.18,1E99,0,1) = .0007= .07%normalcdf( Conclusion: It is highly unlikely that thepopulation proportion favoring Proposition B is 70% or less.

Another approach:

The standard error, SE =sqrt(.7(.3)/1700) = .011114

P-value =

normalcdf( .7353,1E99,.70,.011114) = .0007, which agrees with thecalculations above. There is a subsection of this section called

Choosing the Sample Size, which begins on page 669. This section is well-writtenand easy to follow. I'm going to attempt to put a real-lifeperspective on this. Here's a hypothetical (but realistic)situation. Suppose you are heading a scientificpolling organization. Candidate Herkimer wants you poll a group ofvoters to determine his popularity. And, he would like a margin oferror of 2%. The question now becomes how large a sample size isneeded to get a 2% margin of error?

Sounds simple enough. We are interested in a 95%confidence interval, which involves the z-value 1.96. For simplicity,let's just use 2 instead of 1.96. We need only solve theequation

.02 = 2sqrt((p(1-p)

/n) Easily done: We get (.02)

^{2}= 4p(1-p)/N ==> N =4p(1-p)/(.02)^{2}. But, we have a Catch 22situation. We don't have a value of p,and we can't get an approximation for p before we take a sample. But,it is the sample size we are trying to determine, etc., etc.Note that the equation N =4p(1-p)/(.02)

^{2}represents a parabola that opens downward. If you graphthis downward-opening parabola, you will find that N obtains amaximum value when p = 0.5. Substituting p = 0.5 into the equation,we get N = 4(.5)(1-.5)/(.02) ^{2}= = 1/(.02)^{2}= 2,500. Hence, to get themargin of error requested by candidate Herkimer, we would need an SRSof at least 2,500 voters.In general, if we are interested in a specificmargin of error = m

_{e}for a 95% confidence interval, the sample size needed is1/(m _{e}) ^{2}. Let's retract: Suppose candidate Herkimer wants amargin of error of 1.5% or less. Basically, he wants a 95% confidenceinterval that is 3 units wide. The sample size needed is1/(015)

^{2}=4445. Let's now assume that a sample of this size shows that 43% ofthe voting population will vote for Herkimer. This translates to a95% confidence interval [41.5%, 44.5%]. Putting the properinterpretation on this interval, and assuming that Herkimer isinvolved in a 2-person political race, the polls indicate that heneeds to do something to attract voters. Why? Simply because thestatistics suggest he would loose the election if it were heldtoday.