12.1 INFERENCE FOR A POPULATION PROPORTION(Pages 658 - 674)OVERVIEW: This section deals withtests and confidence intervals for sample proportions. When thesample size is large, the distribution of p(hat) is approximatelynormal with mean p and standard deviation sqrt(p(1-p)/n). Theinference procedures are reasonably accurate if the population is atleast ten times larger than the sample, and if the sample size issuch that np and n(1-p) are both at least 10.
A confidence intervalfor a sample proportion p(hat) has theform
p(hat) plus/minusz*sqrt[p(hat)(
(1-p(hat))/ n). The expression sqrt[p(hat)((1-p(hat))
/n) = SE is called thestandard errorof p(hat) Tests of a null hypothesis: p = p
0 are based on thestatisticz = (p(hat) - p
0 )/sqrt((p0 (1-p0 )/n)) Example:
A random sample of 867 registered voters foundthat 63% favored Proposition A. Calculate a 95% confidence intervalfor the proportion of voters in the population who favor PropositionA.
Response: The desired confidenceinterval is
.63 plus/minus 1.96sqrt(.63(1-.63)/867) = .63plus/minus .0321 = [.5979,.6621].
We see this type of thing (indirectly) quitefrequently in newspapers and magazines when the report results ofsurveys. For instance, a survey might report that 56% of a sample of1,446 voters support candidate Herkimer, with a margin of error = 3%.Those who are statistically literate know that the 95% confidenceinterval is [53%, 59%]. The margin of error reported is 2 standarddeviations. In this example, one standard deviation issqrt(.56(.44)/1446) = .013, and 2(.013) = .026, which is rounded upto 3%.
Example:
A random sample of 1700 voters from a largepopulation found found that 1250 favored Proposition B. Test thehypothesis that that at most 70% of the voters in the populationfavor Proposition B.
Analysis:
Null hypothesis H
0 : p = .7Alt. hypothesis H
a : p > .7Sample proportion p(hat) = 1250/1700 =.7353
z
.7353 = (.7353-.7)/sqrt(.7(.3)/1700) = 3.18 P-value =
normalcdf( 3.18,1E99,0,1) = .0007= .07%Conclusion: It is highly unlikely that thepopulation proportion favoring Proposition B is 70% or less.
Another approach:
The standard error, SE =sqrt(.7(.3)/1700) = .011114
P-value =
normalcdf( .7353,1E99,.70,.011114) = .0007, which agrees with thecalculations above. There is a subsection of this section calledChoosing the Sample Size
, which begins on page 669. This section is well-writtenand easy to follow. I'm going to attempt to put a real-lifeperspective on this. Here's a hypothetical (but realistic)situation. Suppose you are heading a scientificpolling organization. Candidate Herkimer wants you poll a group ofvoters to determine his popularity. And, he would like a margin oferror of 2%. The question now becomes how large a sample size isneeded to get a 2% margin of error?
Sounds simple enough. We are interested in a 95%confidence interval, which involves the z-value 1.96. For simplicity,let's just use 2 instead of 1.96. We need only solve theequation
.02 = 2sqrt((p(1-p)
/n) Easily done: We get (.02)
2 = 4p(1-p)/N ==> N =4p(1-p)/(.02)2. But, we have a Catch 22 situation. We don't have a value of p,and we can't get an approximation for p before we take a sample. But,it is the sample size we are trying to determine, etc., etc. Note that the equation N =4p(1-p)/(.02)2
represents a parabola that opens downward. If you graphthis downward-opening parabola, you will find that N obtains amaximum value when p = 0.5. Substituting p = 0.5 into the equation,we get N = 4(.5)(1-.5)/(.02) 2 = = 1/(.02)2 = 2,500. Hence, to get themargin of error requested by candidate Herkimer, we would need an SRSof at least 2,500 voters.In general, if we are interested in a specificmargin of error = me
for a 95% confidence interval, the sample size needed is1/(me )2 . Let's retract: Suppose candidate Herkimer wants amargin of error of 1.5% or less. Basically, he wants a 95% confidenceinterval that is 3 units wide. The sample size needed is1/(015)2 =4445. Let's now assume that a sample of this size shows that 43% ofthe voting population will vote for Herkimer. This translates to a95% confidence interval [41.5%, 44.5%]. Putting the properinterpretation on this interval, and assuming that Herkimer isinvolved in a 2-person political race, the polls indicate that heneeds to do something to attract voters. Why? Simply because thestatistics suggest he would loose the election if it were heldtoday.