The binomial distributions

"It is a capital mistake to theorizebefore one has data."

Sir Arthur Conan Doyle

11.1 INFERENCE FOR THE MEAN OF A POPULATION(Pages 586 - 605)
OVERVIEW: In reality, one frequentlydoes not know the standard deviation of the population from which arandom sample was obtained. When sample sizes are small and thepopulation standard deviation is not known, statisticians make use ofthe t-distribution, which is bell-shaped, but differs from the normaldistribution. Hence, there is a t-distribution table that differsfrom the normal distribution table. As samples sizes get larger, thet-distribution approaches the normal distribution in shape. In aone-sample test, the degrees of freedom (needed to use the table) isone less than the sample size.
This is a lengthy and important section. Thesummary will consist primarily of examples that will supplement thereading. A few important things to note:
-When the standard deviation parameteris estimated from a sample, the resulting statistic is called thestandard error.The standard error of the sample mean x(bar) is s/sqrt(n), where n is the sample sizeand s is the sample standard deviation.
-If a sample has mean x(bar), the one samplet-statistic is t = (x(bar)-m)/(s/sqrt(n)), with n-1 degrees of freedom.
-A confidence interval calculated for at-statistic has the form x plus/minus t*(s/sqrt(n)), where t* is the appropriatevalue form the t-distribution table.
Example 1 (Single sample):
Here are 12 hypothetical measurements representingan SRS from a large population.
23.2, 25.1, 26.3, 24.8, 24.3, 25.6, 23.8,24.5, 25.1, 25.4, 24.8, 25.7
(a) Calculate a 95% confidence interval for themean of the population from which the sample was chosen.
(b) Is it reasonable to think that the sample camefrom a population with m = 25?
Response to (a): We havex(bar) = 24.88 and s = .8558 with 12-1 = 11 degrees of freedom. Thedesired 95% confidence interval is
24.88 plus/minus 2.201(.8558/sqrt(12)) = 24.88plus/minus .5438 = [24.34, 25.42].
Response to (b): t =(24.88-25)/(.8558/sqrt(12)) = -0.4857 with 11 degrees offreedom. Using the TI-83, prob(t < -.4857) is tcdf(-1E99,-.4857,11) = .3184.Since this is a 2-tail situation, our P-value is 2(.3184) = .6368.With a P-value this large, one would not reject a null hypothesisstating m= 25. In other words, it would not be unreasonable to assume that thepopulation mean is 25. (Don't go wicky-wacky here and make wildclaims that the population mean is 25.).
Example 2 (Matched Pairs):
Assume that the following data represents apretest, posttest situation. In a situation such as this, onecalculates a difference column and tests the null hypothesis that the mean ofthe difference column is 0. In a 2-tail test, the alternate hypothesis is that themean of the differences is not zero.

Individual #

After Score (A)

Before Score (B)

Diff. = (A) - (B)

1
78
73
5

2
92
86
6

3
68
70
-2

4
69
62
7

5
76
72
4

6
80
83
-3

7
91
85
6

8
79
76
3

9
86
86
0

10
59
54
5

Mean of difference column

3.1

s for difference column

3.5418137

SE =standard error

1.1200198 = s/sqrt(10)

Degrees of freedom

9 = 10-1

t-statistic

2.767808 =

(3.1 - 0)/SE
Using the TI-83, tcdf(2.767808,1E99,9) =.0109158076, or about 1.1%. In other words, we conclude that there isabout a 1% chance that we would get a difference as large 3.1 if thenumbers in the difference column came from a population withm = 0.Since this probability is small, one would probably reject a nullhypothesis H₀: m = 0. Since we are using a 2-tail test, the P-value is 2tcdf(2.767808,1E99,9) = .0218316152, or about 2.18%.
However, let us note that if we are being veryformal, and if we decide beforehand to run a 2-tail test at the 1%level of significance, we would not reject H₀. At this level ofsignificance, the critical region of rejection would be t < -3.250or t > 3.250. Our t-statistic of 2.767808 is not in the criticalregion.
If we decide to run a 2-tail test at the 5% levelof significance, the critical region becomes t < -2.262 or t >2.262. Our calculated t = 2.767808 is in the critical region. Wewould reject H₀ at this level.
This example emphasizes a point madeby the text authors in the SUMMARY on page 566.
"P-values are moreinformative than the reject-or-not result of a fixed alpha test.Beware of placing too much weight on traditional values of alpha,such as alpha = .05."
Rules for using the t-test:
- Ideally, the sample comes from anormal distribution.
- The assumption of an SRS is important.
- For sample sizes less than 15, the t-procedurescan be used if the data are close to normal. Do not use t-proceduresif the data are clearly nonnormal or if outliers are present.
- For sample sizes 15 or greater, t-procedures canbe safely used except in the presence of outliers or strongskewness.
- For sample sizes 40 or greater, t-procedures canbe used even if data is heavily skewed.

Note: Problem #5 in Section II of the 2001 Advanced Placement Statistics Examination involves the matched pairs t procedure. You can see a detailed solution to this actual AP problem by going to the home page of Herkimer's Hideaway, taking the link to WRITINGS AND REFLECTIONS, and then the link to item #66. Please note that this link does not contain a statement of the problem, just a detailed solution. The location does provide a link to the College Board, where you can get the problem statement if you don't have it available. (The College Board does not allow for the copying of an actual problem statement on a site such as this.)

RETURN TO TEXTBOOK HOME PAGE

Individual #	After Score (A)	Before Score (B)	Diff. = (A) - (B)
1	78	73	5
2	92	86	6
3	68	70	-2
4	69	62	7
5	76	72	4
6	80	83	-3
7	91	85	6
8	79	76	3
9	86	86	0
10	59	54	5

Mean of difference column			3.1
s for difference column			3.5418137
SE =standard error			1.1200198 = s/sqrt(10)
Degrees of freedom			9 = 10-1
t-statistic			2.767808 = (3.1 - 0)/SE