Lesson 7 Summary
Non-symmetrical data distributions are referred to as skewed.
Left-skewed or skewed to the left means the data spreads out longer (like a tail) on the left side.
Right-skewed or skewed to the right means the data spreads out longer (like a tail) on the right side.
The center of a skewed data distribution is described by the median.
Variability of a skewed data distribution is described by the interquartile range (IQR).
The IQR describes variability by specifying the length of the interval that contains the middle 50% of the data values.
Outliers in a data set are defined as those values more than 1.5(IQR) from the nearest quartile. Outliers are usually identified by an “*” or a “•” in a box plot.
Exercises 1–3: Skewed Data and its Measure of Center
Consider the following scenario. A television game show, “Fact or Fiction”, was canceled after nine shows. Many people watched the nine shows and were rather upset when it was taken off the air. A random sample of eighty viewers of the show was selected. Viewers in the sample responded to several questions. The dot plot below shows the distribution of ages of these eighty viewers:
Is this distribution symmetrical?
Is it skewed to the left or to the right?
What does the left most dot in this dot plot tell us?
What is the age of the youngest viewer?
What is the age of the oldest viewer?
Where is the mean?
What age would describe a typical age for this sample of viewers?
Constructing and Interpreting the Box Plot
Minimum, Maximum, Quartile 1, Median, Quartile 3
Recall that the 5 values used to construct the dot plot make up the 5-number summary.
What is the 5-number summary for this data set of ages?
What percent of the data does the box part of the box plot capture?
What percent of the data falls between the minimum value and Q1?
What percent of the data falls between Q3 and the maximum value?
Students at Waldo High School are involved in a special project that involves communicating with people in Kenya.
Consider a box plot of the ages of 200 randomly selected people from Kenya:
A data distribution may contain extreme data (specific data values that are unusually large or unusually small relative to the median and the interquartile range). A box plot can be used to display extreme data values that are identified as outliers.
The “*” in the box plot are the ages of four people from this sample. Based on the sample, these four ages were considered outliers.
An outlier is defined to be any data value that is more than 1.5 x (IQR) away from the nearest quartile.
1. Estimate the values of the four ages represented by an *.
2. What is the median age of the sample of ages from Kenya? What are the approximate values of Q1 and Q3? What is the approximate IQR of this sample?
3. Multiply the IQR by 1.5. What value do you get?
4. Add 1.5 x (IQR) to the 3rd quartile age (Q3). What do you notice about the four ages identified by an *?
5. Are there any age values that are less than ? If so, these ages would also be considered outliers.
6. 20. Explain why there is no * on the low side of the box plot for ages of the people in the sample from Kenya.
1. A data set consisting of the number of hours each of 40 students watched television over the weekend has a minimum value of 3 hours, a Q1 value of 5 hours, a median value of 6 hours, a Q3 value of 9 hours, and a maximum value of 12 hours. Draw a box plot representing this data distribution.
2. What is the interquartile range (IQR) for this distribution? What percent of the students fall within this interval?
3. Do you think the data distribution represented by the box plot is a skewed distribution? Why or why not?
4. Estimate the typical number of hours students watched television. Explain why you chose this value.
Rotate to landscape screen format on a mobile phone or small tablet to use the Mathway widget, a free math problem solver that answers your questions with step-by-step explanations.