"An approximate answer to the rightquestion is worth a good deal more than the exact answer to anapproximate question."

John Tukey
5.1 DESIGNING SAMPLES (Pages 245-261)

OVERVIEW: If one wishes to obtainreliable statistical information from sampling, one must design thesampling process very carefully. Good sampling techniques, whichinvolve the use of chance, can produce meaningful and useful results.Bad sampling techniques often produce worthless data. This sectionintroduces or reinforces many important definitions.

Voluntary response sample: Consists of people who choose themselves. (As contrastedto being chosen by some designed process.)

Example: Listeners calling in to respond to a questionasked by a talk show host.

Two variables areconfounded when their effects on aresponse variable cannot be distinguished from one another.

Example (Pre-test, post-test situation)
...Group asked for opinion (favorable, unfavorable) on politicalcandidate Herkimer.
...Group then reads favorable propaganda about Herkimer for a periodof time, and then are once again asked to express an opinion.
...Result: More of the group now have favorable opinions aboutHerkimer.
...Purpose is to determine if the propaganda changed attitudes.
...But, during the reading period, Herkimer makes headlines bydonating a good amount of his personal fortune to a worthwhilecharity.

-How much did the explanatory variable (readingfavorable propaganda) affect the change in attitude? We'll neverreally know, since some (perhaps most) of the change was undoubtedlydue to Herkimer's well-publicized donation to charity. Thereading of propaganda and the general knowledge of thecharity donation are confoundingvariables.

Statistical inference: Provides ways to provide "reasonable" responses tospecific questions by examining data.

Population: Groupfrom which information is desired.

Frame: A list ofsampling units. (Note: Frame might be different from population. Forexample, population might be city residents, frame might be telephonebook for the city.)

Sample: Part of apopulation that is examined in an attempt to obtain information aboutthe population.

Design: The methodused to choose the sample.

Convenience sample:Just select individuals easiest to reach.
...
Example: Getting opinions aboutpolitical candidate Herkimer by simply asking people who enter aspecific store.

Sample is biased if it systematicallyfavors certain outcomes.
...
Example: Call-in responses to a question posed by a talkshow host produces results that are most certainly biased.

Simple random sample(SRS) of size N: Sample chosen in such a way that every set of Nindividuals has an equal chance of being chosen.

Careful...the notion of an SRS can be tricky.Here's an example.
A class consists of 4 boys and 4 girls. The teacher wants a sample oftwo students. She decides to flip a coin. If the coin comes up heads,she will choose two boys by a random process. If tails, she willchoose two girls by a random process.

Question 1: Does each student have an equal probability ofbeing in the sample?
...Answer: Yes. Each student has a 25% probability of being in thesample.

Question 2: Is this an SRS of two students from theclass?
...Answer: No, since not all groups of two students are equallylikely to be selected. The design does not allow for a sampleconsisting of one girl and one boy.

Probability sample... gives each member of the population a known chance(greater than zero) of being chosen.

Stratified random sample... divide population into strata, choose an SRS in eachsample, and combine the SRS's to get a bigger sample (which would notbe an SRS).

Example: Cate Headmaster Benjamin Williams wants torandomly select twelve students to receive twelve free tickets to aconcert. To make sure all classes are represented, he stratifies thepopulation by classes (senior, junior, sophomore, freshman) and thenrandomly chooses three students from each strata. He then has arandomly chosen sample of twelve Cate students who will receive thefree tickets. This is not an SRS of twelve Cate students, since notall combinations of twelve students can be obtained by this process.

Reasons for stratification
o Possible reduced variation of the estimators.
o Administrative convenience and reduced cost.
o Estimates needed for subgroups of the population.

Multistage sample design...select successively smaller groups within a populationby stages.

Example
...Population: Residents of California.
...From the collection of counties, randomly pick 20 counties.
...Then, randomly pick 10 cities/towns in each of the selectedcounties.
...Then, randomly pick 100 individuals from phone listings in each ofthe selected cities/towns.
...Note: This design does not yield an SRS of California residents.(Why?)

Undercoverage: Somegroups in the population are left out in the process of choosing thesample.

Nonresponse: Occurswhen individual can't be contacted or refuse to cooperate.

Response bias:Refers to a variety of things that can lead to incorrect or falseresponses.

Example: An interviewer directly asking a person "Haveyou ever shoplifted?" may well get some answers that are lies. Asample design that would allow for an anonymous response wouldprobably get more accurate results.

Wording of questions... can greatly influence the response.

Example (From Statistics: Concepts andControversies, by David Moore...NY Times/CBS News Poll):
-Version 1: Do you think there should be an amendment to theConstitution prohibiting abortions?
-Version 2: Do you think there should be an amendment to theConstitution protecting the life of the unborn child?
...Results: Version 1 (29% Yes, 62% No), Version 2 (50% Yes, 39% No).

Poorly worded questions can confuse those responding to it.

Example: Do you think it is wrong when the governmentdoesn't interfere in potentially dangerous religious matters?
YES / NO