"Data, data everywhere, but not a thought to think."" -- Computer scientist Jesse Shera

3.2 CORRELATION (Pages 129 - 135)

OVERVIEW: The correlation coefficient, r, is a number that "measures the strength and direction of the linear relationship between two quantitative variables. r assumes values between -1 and 1, inclusive. The formula for r is shown below.

r = [1/(n-1)]sum[(x- x)(y- y)/(sxsy)]

where n is the number of pairs (x,y), x is the mean of the x values, sx is the standard deviation of the x values, etc.

It should be noted that the formula for r uses standardized observations (x- x)/sx and (y- y)/sy, and that these standardized values have no units.

• r > 0 indicates a positive association. (r = 1 indicates all points lie on a straight line with a positive slope.)
• r < 0 indicates a negative association. (r = -1 indicates all points lie on a straight line with negative slope.)
• r close to 0 indicates a very weak linear relationship.
• r does not change if units of measure are changed (from inches to feet, for instance)
• r has no unit of measurement. It is calculated using standardized values, which have no units of measurement.
• r measures only the strength of a linear relationship. (It does not describe a curved relationship, for instance.)
• r is affected by outliers (as is the mean and standard deviation).

Extremely important for success of the Advanced Placement Statistics Examination:
If you are given a numerical data set, always (I repeat, always) display the shape of the distribution.

Using the TI-83, this can be done very easily with a histogram or a boxplot. For two variable data, a scatterplot should be shown.

 Here are some Algebra II scholars illustrating the concept of correlation with their textbooks. What are they demonstrating? (A) Correlation r close to 1. (B) Correlation r close to -1. (C) Correlation r close to 0.