""Itain't so much the things we don't know that get us into trouble. It'sthe things we know that ain't so." -- ArtemusWard


OVERVIEW: Correlation and regression need to be interpreted withcaution. Two variables may be strongly associated, but this does notmean that one causes the other. Bottom line: High correlation does not imply causation. Among other things, lurking variables and common responseshould always be considered.

Extrapolation: UsingLSRL to predict values outside the domain of the explanatory variableused to create the line. (It is generally best to avoidextrapolating.)

Lurking variable: Avariable that effects the relationship of the variables in a study,but is not included among the variables studied. Simple example: Astrong positive association might exist between shirt size andintelligence for teenage boys. An obvious lurking variable is age,since shirt size and intelligence among teenagers generally increaseswith age.

Correlation based on averages are frequently higher than correlationbased on all of the numbers that make up the averages. Example:Consider the temperature at noon each day of the year in a specificcity. If the days of the year are numbered 1,2,3,...,365 startingwith January 1, then we could create a scatterplot {(day of year,temp. at noon)} of 365 points and calculate a regression line and acorrelation coefficient. If, instead of doing this, we simplycalculated the
mean temperature for eachmonth, we would than have 12 points toplot. Chances are that the correlation for the "month" line would behigher than that of the "day" line. Using a single mean to representa set of 30 numbers would not reflect the variation among the 30numbers.

If there is a strong association between twovariables x and y, any one of the following statements could betrue.

o x causes y (remember, association does not imply causation, but causation could exist.)

o both x and y are responding to changes in someunobserved variable or variables. (The is called common response.)

o the effect of x on y is hopelessly mixed up withthe effects of other variables on y. Because of this, it isimpossible to directly determine the effects of x on y. This iscalled confounding. (Confounding is always a potential problem inobservational studies, but can be controlled to some degree inexperiments where there is a control group and a treatmentgroup.)

RETURN TO TEXTBOOK HOME PAGE /Back to the top of this page