OVERVIEW: A numerical summary of a data distributionshould somehow indicate its center and its spread. Theconcept of spread is important. As a simple example, sets A and Bboth have a mean of 50. However, the sets are very different in termsof spread.
A = {50,50,50,50,50,50,50,50,50,50} and B ={0,0,0,0,0,100,100,100,100,100}
Measures of center.
Mean (important to understand sigmanotation)
Median (the 50th percentile)
Mode (most frequent score. A data set can be multimodal).
Measures of spread.
Quartiles [Q1(25th percentile), Median (50thpercentile), Q3(75th percentile)]
Range (Max. value  min. value)
Interquartile range (Q3  Q1)
The FiveNumber Summary (Min., Q1, Median, Q3, Max.)
It is very important to note that there are definite conventions forestablishing the "big five" for a numerical data set. For instance,the median for a data set is unique. You should understand how thevalues are determined for the following data sets.
Set#  Data Set  Min.  Q1  Median  Q3  Max.  Range  IQ Range 
#1  1,3,5,20,22        
#2  1,3,5,20,22,40        
#3  1,3,5,20,22,40,40,50        
#4  1,3,5,20,22,40,40,50,140        
The "big five" are all that is needed to construct aboxplot (sometimes called a boxwhisker plot) for adata set. Boxplots are useful when you have lots of data tosummarize, where displays like dotplots and stemplots becomeimpractical. A boxplot does not identify every individual piece ofdata, but rather summarizes the data by quartiles. A modified boxplotis frequently used to identify outliers. An outlier is definedto be a number that is more than 1.5 IQ ranges above Q3, or less than1.5 IQ ranges below Q1.
For example, in Data Set #4 above, the IQ Range is41.
1.5 x 41 = 61.5.
Q3 + 1.5(IQ Range) = 45 + 61.5 = 106.5. Since 140 > 106.5, thenumber 140 is an outlier, and it would be soidentified in a modifiedboxplot. (The TI83 graphics calculator produces both types ofboxplots.)
Important to note:
A statistic is a number that is computed from a sample. Aparameter is a number that is computed from a population.Means, medians, IQ ranges, etc. could be statistics or parameters.
In statistics, the standard deviation is frequently a veryimportant measure of spread. The variance is the squareof the standard deviation. There are two different standarddeviations, depending on whether it is being computed form apopulation or from a sample.
A population standard deviation is designated by
s, and it is a parameter.
A sample standard deviation is designated by s, and it is astatistic.
s and s are calculated slightlydifferently, as demonstrated below for the small data set W={10,20,30}. The mean of W is 20. (If a data set is large, thedifference between s and s very small.)
Data (X)  (X20)  (X20)^{2} 
  
  
  
TOTALS   
If W is considered to be a population, then thestandard deviation and variance are, respectively,
s = sqrt(200/3) = 8.16495809 ands^{2} = 66.6666667. If W is considered to be a sample, then the standard deviation andvariance are, respectively,
s = sqrt(200/2) = 10 and s^{2} = 100.
It's significant to note that units attached to a variance aresquare units, whereas the standard deviation has the same unitas the data itself.
For the present time we will be mostly concerned with the standarddeviation, s. Things to note:
Remember the sets A and B described in the OVERVIEW. Set A hasmean = 50, standard deviation = s = 0, and variance = s^{2} =0 (square units). Set B has mean = 50, standard deviation = s =52.705, and variance = s^{2} = 2777.817 (square units).
The phrase degrees of freedom is mentioned in this section.This concept will be important in future studies, but for now, anintuitive feeling for the phrase will be provided. Consider a set offive numbers with a definite sum (say 100), and hence, a definitemean (100/5 = 20). Note the table:
Set of five numbers  Sum  Mean  
     100  20 
Note that one could replace the four numbers 12, 18, 5, and 43with any other set of four numbers, and then "adjust" thevalue of x so that the sum of the five numbers is 100. That is, fourof the numbers can vary freely and, for each set of four numbers, thevalue x can be "adjusted" to preserve the sum of 100. In thissituation, we say that there are 51 = 4 degrees of freedom. If wehad a set of N numbers with an definite sum, then the degrees offreedom would be N1.
Note: Problem #1 in Section II of the 2001 Advanced Placement Statistics Examination involves the concept of an outlier and other statistical concepts introduced in Chapter 1. You can see a detailed solution to this actual AP problem by going to the home page of Herkimer's Hideaway, taking the link to WRITINGS AND REFLECTIONS, and then the link to item #67. Please note that this link does not contain a statement of the problem, just a detailed solution. The location does provide a link to the College Board, where you can get the problem statement if you don't have it available. (The College Board does not allow for the copying of an actual problem statement on a site such as this.) 
This photo shows a nonrandom sample of sixteenmembers of the Cate School community. The tall gentleman is Mr. JimMasker, a distinguished Cate history teacher. (He's not standing on astool. He is actually as tall as he appears.) Assume that we recordthe heights of the sixteen individuals and calculate the followingstatistics:
Mean  Median  Standard Deviation  Variance  Range 
Interquartile Range 
 25th percentile  75th Percentile 

How does Mr. Masker's height affect eachstatistic?
RETURN TO TEXTBOOK HOME PAGE /Back to the top of this page