OML Search

Distributions of Data, Random Variables, and Probability Distributions





In data analysis, variables whose values depend on chance play an important role in linking distributions of data to probability distributions. Such variables are called random variables. In this lesson, we will learn distributions of data, random variables and probability distributions.


Distributions of Data

Recall that relative frequency distributions given in a table or histogram are a common way to show how numerical data are distributed. In a histogram, the areas of the bars indicate where the data are concentrated.

Recall that the sum of the areas of the bars of a relative frequency histogram is 1. Although the units on the horizontal axis of a histogram vary from one data set to another, the vertical scale can be adjusted (stretched or shrunk) so that the sum of the areas of the bars is 1. With this vertical scale adjustment, the area under the curve that models the distribution is also 1. This model curve is called a distribution curve, but it has other names as well, including density curve and frequency curve.

The purpose of the distribution curve is to give a good illustration of a large distribution of numerical data that doesn’t depend on specific classes. To achieve this, the main property of a distribution curve is that the area under the curve in any vertical slice, just like a histogram bar, represents the proportion of the data that lies in the corresponding interval on the horizontal axis, which is at the base of the slice.

Density Curves and Introduction to the Normal Distribution
Properties of Density Curves
• All values are above the x-axis
• The total area under the curve equals 1
• The area under the curve between two values corresponds to the proportion of all observations that fall within that range.





Random Variables

When analyzing data, it is common to choose a value of the data at random and consider that choice as a random experiment, as introduced in section 4.4. Then, the probabilities of events involving the randomly chosen value may be determined. Given a distribution of data, a variable, say X, may be used to represent a randomly chosen value from the distribution. Such a variable X is an example of a random variable, which is a variable whose value is a numerical outcome of a random experiment.

In the histogram for a random variable, the area of each bar is proportional to the probability represented by the bar. The sum of the areas is 1 and the sum of the probabilities is 1.

This is also true for a continuous probability distribution: The area of the region under the curve is 1, and the areas of vertical slices of the region—similar to the bars of a histogram—are equal to probabilities of a random variable associated with the distribution. Such a random variable is called a continuous random variable, and it plays the same role as a random variable that represents a randomly chosen value from a distribution of data. The main difference is that we seldom consider the event in which a continuous random variable is equal to a single value like X = 4; rather, we consider events that are described by intervals of values such as 1 < X < 4 and X > 15. Such events correspond to vertical slices under a continuous probability distribution, and the areas of the vertical slices are the probabilities of the corresponding events. (Consequently, the probability of an event such as, X = 4, would correspond to the area of a line segment, which is 0.)

Basic idea and definitions of random variables.


Discrete and continuous random variables
Defining discrete and continuous random variables. Working through examples of both discrete and continuous random variables.





Discrete uniform distribution
Working through more examples of discrete probability distribution.


The Normal Distribution

Many natural processes yield data that have a relative frequency distribution shaped somewhat like a bell, as in the distribution below with mean m and standard deviation d.

Such data are said to be approximately normally distributed and have the following properties.
• The mean, median, and mode are all nearly equal.
• The data are grouped fairly symmetrically about the mean.
• About two-thirds of the data are within 1 standard deviation of the mean.
• Almost all of the data are within 2 standard deviations of the mean.

As stated above, you can always associate a random variable X with a distribution of data by letting X be a randomly chosen value from the distribution. If X is such a random variable for the distribution above, we say that X is approximately normally distributed.

As described, relative frequency distributions are often approximated using a smooth curve—a distribution curve or density curve—for the tops of the bars in the histogram. The region below such a curve represents a distribution, called a continuous probability distribution. There are many different continuous probability distributions, but the most important one is the normal distribution, which has a bell-shaped curve.

Just as a data distribution has a mean and standard deviation, the normal probability distribution has a mean and standard deviation. Also, the properties listed above for the approximately normal distribution of data hold for the normal distribution, except that the mean, median, and mode are exactly the same and the distribution is perfectly symmetric about the mean.

A normal distribution, though always shaped like a bell, can be centered around any mean and can be spread out to a greater or lesser degree, depending on the standard deviation.



Normal Distribution
Properties of the Normal Distribution
1. The curve is continuous
2. The curve is bell-shape
3. The curve is symmetrical about the mean
4. Th mean, median and mode are equal to each other
5. The curve never touches the x-axis
6. The area under the curve is 1
7. The distribution is described by the mean and standard deviation.


The Normal Distribution and the 68-95-99.7 Rule.
This video shows the normal distribution and what percentage of observed values fall within either 1, 2, or 3 standard deviations from the mean. One specific example is discussed.



Try the free Mathway calculator and problem solver below to practice various math topics. Try the given examples, or type in your own problem and check your answer with the step-by-step explanations.
Mathway Calculator Widget


OML Search


We welcome your feedback, comments and questions about this site or page. Please submit your feedback or enquiries via our Feedback page.