Sanderson M. Smith

THOUGHTS ON STATISTICAL SIGNIFICANCE

(A Note to Advanced Placement Statistics Students)

OK, we are really down to knitty-gritty time. We need to really start concentrating on being statistically literate, saying the right things, etc., etc. So, attention is important. If you feel the needy to be antsy, just take a short break and step outside to loosen up.

I will be sending notes like this more frequently in an attempt to get you in tune for what you will experience on the AP Statistics Test. It will be sophisticated, and you must communicate like a statistician and not wander off into layman's language when addressing questions you will be asked. (This is not a silly statement. As an AP Stat. reader last summer at the University of Nebraska, I encountered many essay responses that sounded like they came from someone who had never set foot in a statistics course.)

Focus of this note: STATISTICAL SIGNIFICANCE.

Understand that this is something that must be defined in any real statistical problem. For instance, here is question.

IS A DIFFERENCE OF 20% STATISTICALLY SIGNIFICANT?

Well, if a stock you owned dropped 20% in value, or if gasoline prices suddenly rise by 20%, most folks would probably consider this significant.

However, I hope that you realize that the question, as stated, is virtually meaningless.

Suppose you flipped 20 coins and came up with 12 heads. The probability you would get 12 or more heads is

1 - binomcdf(20,.5,11) = .2517

The event is not that unlikely. It's not that rare an occurrence.

Suppose you flipped another set of 20 coins and got 8 heads. The probability of 8 or fewer heads is

binomcdf(20,.5,8) = .2517.

(It shouldn't surprise you that the two calculated probabilities are equal.) So, 8 or fewer heads is not that unlikely.

Now, the proportion of heads in the first case is 12/20 = 60%. In the second case, the proportion of heads is 8/20 = 40%. The difference in the two sample proportions is 20%. In this situation, IS THE 20% SIGNIFICANT? With just a bit of common sense thinking, and realizing that getting 12 heads or 8 heads in 20 flips of a coin hardly represent unusual outcomes, the 20% difference in proportions would hardly be considered significant.

=========

Now, suppose Herkimer is a candidate for political office. A month ago, a scientifically conducted poll found that 75% of a sample of registered voters would vote for him. Yesterday, another scientifically conducted poll found 70% would vote for him. IS THE 5% DIFFERENCE SIGNIFICANT? (Please refrain from saying the difference is not significant because he will still win the election. We are now statistically literate. We know better than to give a response like this.)

Being statistically literate, you know that the question is still relatively meaningless. To give it any meaning at all, you need to know something about the sample sizes. Lets look at two scenarios.

--------

Scenario #1:

 Sample Size # who would vote for Herky proportion of Herky voters One month ago 80 60 .75 Yesterday 100 70 .70

We want to test

Ho: p(month ago) = p(yesterday), verses

Ha: p(month ago) > p(yesterday)

SE = sqrt[(.70)(.30)/100 + (.75)(.25)/80] = 0.06667

p-value = normalcdf(.05,1E99,0,.06667) = .2266, or about 23%.

There is insufficient evidence to reject Ho. That is, we don't have strong evidence to suggest a change in voter attitude. (NOTE: We are not claiming there hasn't been a change. We are simply saying there isn't enough evidence to suggest a change in voter attitude has taken place.)

---------------

Scenario #2:

 Sample Size # who would vote for Herky proportion of Herky voters One month ago 800 600 .75 Yesterday 1000 707 .70

We still want to test

Ho: p(month ago) = p(yesterday), verses

Ha: p(month ago) > p(yesterday)

SE = sqrt[(.70)(.30)/1000 + (.75)(.25)/800] = 0.02108.

p-value = normalcdf(.05,1E99,0,.02108) = .0088 = 0.88% (less than 1%)

Now there is sufficient evidence to reject Ho. If Ho is true, there is less than a 1% chance that we could get the 5% difference. (NOTE: We are not claiming there has been a change in voter attitude. We are simply saying that there is strong evidence to suggest that the proportion of voters who support Herky has decreased.)

Referencing Scenario #2, we could produce a confidence interval for the proportion difference. A 95% confidence interval has the form

.05 plus/minus 1.96(.02108) = .05 plus/minus .0413 = (.0087,.0913).

In other words, we are reasonably certain that support for Herkimer has dropped between 1% and 9% during the past month. The constructed interval has a 95% chance of containing the actual difference.... and the interval does not contain 0.

==========================

MATH POWER TO ALL.