More Statistics Lesson
Data can be organized and summarized using a variety of methods. Tables are commonly used, and there are many graphical and numerical methods as well. The appropriate type of representation for a collection of data depends in part on the nature of the data, such as whether the data are numerical or nonnumerical.
In these lessons, we will learn some common graphical methods for describing and summarizing data: Frequency Distributions, Bar Graphs, Circle Graphs, Histograms, Scatterplots and Timeplots.
The frequency, or count, of a particular category or numerical value is the number of times that the category or value appears in the data. A frequency distribution is a table or graph that presents the categories or numerical values along with their associated frequencies. The relative frequency of a category or a numerical value is the associated frequency divided by the total number of data.
Relative frequencies may be expressed in terms of percents, fractions, or decimals. A relative frequency distribution is a table or graph that presents the relative frequencies of the categories or numerical values. Note that the total for the relative frequencies is 100%. If decimals were used instead of percents, the total would be 1. The sum of the relative frequencies in a relative frequency distribution is always 1.
Differences between frequency distribution table and relative frequency distribution table
A commonly used graphical display for representing frequencies, or counts, is a bar graph, or bar chart. In a bar graph, rectangular bars are used to represent the categories of the data, and the height of each bar is proportional to the corresponding frequency or relative frequency. All of the bars are drawn with the same width, and the bars can be presented either vertically or horizontally. Bar graphs enable comparisons across several categories, making it easy to identify frequently and infrequently occurring categories.
Bar graphs are commonly used to compare frequencies, They are sometimes used to compare numerical data that could be displayed in a table, such as temperatures, dollar
amounts, percents, heights, and weights.
A bar graph is a graph that compares amounts in each category to each other using bars.
How to read and interpret a bar graph?
A segmented bar graph is used to show how different subgroups or subcategories contribute to an entire group or category. In a segmented bar graph, each bar represents a category that consists of more than one subcategory. Each bar is divided into segments that represent the different subcategories. The height of each segment is proportional to the frequency or relative frequency of the subcategory that the segment represents. How to interpret percentage segmented bar charts?
Bar graphs can also be used to compare different groups using the same categories. It is sometimes called a double bar graph.
Interpreting Double Bar Graphs
How to interpret data shown in a double bar graph?
Circle graphs, often called pie charts, are used to represent data with a relatively small number of categories. They illustrate how a whole is separated into parts. The area of the circle graph representing each category is proportional to the part of the whole that the category represents.
Each part of a circle graph is called a sector. Because the area of each sector is proportional to the percent of the whole that the sector represents, the measure of the central angle of a sector is proportional to the percent of 360 degrees that the sector represents.
Creating a Circle Graph
How to create a circle graph, or “pie chart” from some given data?
When a list of data is large and contains many different values of a numerical variable, it is useful to organize it by grouping the values into intervals, often called classes. To do this, divide the entire interval of values into smaller intervals of equal length and then count the values that fall into each interval. In this way, each interval has a frequency and a relative frequency. The intervals and their frequencies (or relative frequencies) are often displayed in a histogram.
Histograms are graphs of frequency distributions that are similar to bar graphs, but they have a number line for the horizontal axis. Also, in a histogram, there are no regular spaces between the bars. Any spaces between bars in a histogram indicate that there are no data in the intervals represented by the spaces.
How to create a histogram from the given data?
How to create a relative frequency histogram?
Relative frequency histogram has percentage of data values on the vertical axis rather than the frequency.
Step 1: Find the total number of data values.
Step 2: Find the percent of data values in each interval (organize in a table)
Step 3: Draw Histogram.
To study connection between a histogram and the corresponding frequency histogram, consider the histogram below showing Kyle’s 20 homework grades for a semester. Notice that since each bar represents a single whole number (6,7,8,9 or 10), those numbers are best placed in the middle of the bars on the horizontal axis. In this case Kyle has one grade of 6 and five grades of 7.
a) Make a relative frequency histogram of these grades by copying the histogram but making a scale that shows proportion of all grades on the vertical axis rather than frequency.
b) Compare the shape, centre, and spread of the two histograms.
Differences between a bar graph and a histogram
• Bar graph shows the number of items in specific categories.
• Drawn with space between the columns.
• Do not have to be organized into equal intervals of data.
• Bars show categories of data.
• Histogram shows frequency of data divided into equal intervals.
• No space between the columns.
• Must be organized into equal intervals of data.
• Bars show continuous data.
All examples used thus far have involved data resulting from a single characteristic or variable. These types of data are referred to as univariate, that is, data observed for one variable. Sometimes data are collected to study two different variables in the same population of individuals or objects. Such data are called bivariate data. We might want to study the variables separately or investigate a relationship between the two variables. If the variables were to be analyzed separately, each of the graphical methods for univariate numerical data presented above could be applied.
To show the relationship between two numerical variables, the most useful type of graph is a scatterplot. In a scatterplot, the values of one variable appear on the horizontal axis of a rectangular coordinate system and the values of the other variable appear on the vertical axis. For each individual or object in the data, an ordered pair of numbers is collected, one number for each variable, and the pair is represented by a point in the coordinate system.
A scatterplot makes it possible to observe an overall pattern, or trend, in the relationship between the two variables. Also, the strength of the trend as well as striking deviations from the trend are evident. In many cases, a line or a curve that best represents the trend is also displayed in the graph and is used to make predictions about the population.
Scatter Plots : Introduction to Positive and Negative Correlation
A scatter plot is a graph of a collection of ordered pair (x,y)
The graph looks like a bunch of dots, but some of the graphs are a general shape or move in a general direction.
If the x-coordinates and the y-coordinates both increase, then it is positive correlation. This means that as the value of one variable increases, the other increases as well. The variables are related.
If the x-coordinates and the y-coordinates have one increasing and one decreasing, then it is negative correlation. This means that as one increases, the other decreases.
If there seems to be no pattern, and the points looked scattered, then it is no correlation. This means that the two variables are not related. As one variable increases, there is no effect on the other variable.
Which scatterplots below show a linear trend?
Sometimes data are collected in order to observe changes in a variable over time. For example, sales for a department store may be collected monthly or yearly.
A time plot (sometimes called a time series) is a graphical display useful for showing changes in data collected at regular intervals of time. A time plot of a variable plots each observation corresponding to the time at which it was measured. A time plot uses a coordinate plane similar to a scatterplot, but the time is always on the horizontal axis, and the variable measured is always on the vertical axis. Additionally, consecutive observations are connected by a line segment to emphasize increases and decreases over time.
What is a time plot?
Try the free Mathway calculator and
problem solver below to practice various math topics. Try the given examples, or type in your own
problem and check your answer with the step-by-step explanations.
We welcome your feedback, comments and questions about this site or page. Please submit your feedback or enquiries via our Feedback page.