# Association Between Categorical Variables

Examples, solutions, videos, and lessons to help Grade 8 students learn how to use row relative frequencies or column relative frequencies to informally determine if there is an association between two categorical variables.

### New York State Common Core Math Grade 8, Module 6, Lesson 14

Lesson Summary

• Saying that two variables ARE NOT associated means that knowing the value of one variable provides no information about the value of the other variable.
• Saying that two variables ARE associated means that knowing the value of one variable provides information about the value of the other variable.
• To determine if two variables are associated, calculate row relative frequencies. If the row relative frequencies are about the same for all of the rows, it is reasonable to say that there is no association between the two variables that define the table.
• Another way to decide if there is an association between two categorical variables is to calculate column relative frequencies. If the column relative frequencies are about the same for all of the rows, it is reasonable to say that there is no association between the two variables that define the table.
• If the row relative frequencies are quite different for some of the rows, it is reasonable to say that there is an association between the two variables that define the table.

Lesson 14 Classwork

Example 1
Suppose a random group of people are surveyed about their use of smartphones. The results of the survey are summarized in the tables below.

Example 2
Suppose a sample of participants (teachers and students) was randomly selected from the middle schools and high schools in a large city. These participants responded to the question: Which type of movie do you prefer to watch?

1. Action (The Avengers, Man of Steel, etc.)
2. Drama (42 (The Jackie Robinson Story), The Great Gatsby, etc.)
3. Science Fiction (Star Trek into Darkness, World War Z, etc.)
4. Comedy (Monsters University, Despicable Me 2, etc.)
Movie preference and status (teacher/student) were recorded for each participant.

Exercises

1. Two variables were recorded. Are these variables categorical or numerical? Both variables are categorical.
2. The results of the survey are summarized in the table below.
a. What proportion of participants who are teachers would prefer “action” movies?
b. What proportion of participants who are teachers would prefer “drama” movies?
c. What proportion of participants who are teachers would prefer “science fiction” movies?
d. What proportion of participants who are teachers would prefer “comedy” movies?
The answers to Exercise 2 are called row relative frequencies. Notice that you divided each cell frequency in the teacher row by the row total for that row. Below is a blank relative frequency table.
Write your answers from Exercise 2 in the indicated cells in the table above.
3. Find the row relative frequencies for the “student” row. Write your answers in the table above.
a. What proportion of participants who are students would prefer “action” movies?
b. What proportion of participants who are students would prefer “drama” movies?
c. What proportion of participants who are students would prefer “science fiction” movies?
d. What proportion of participants who are students would prefer “comedy” movies?
4. Is a participant’s status (i.e., teacher or student) related to what type of movie he or she would prefer to watch? Why or why not? Discuss this with your group.
5. What does it mean when we say that there is “no association” between two variables? Discuss this with your group.
6. Notice that the row relative frequencies for each movie type are the same for both the teacher and student rows. When this happens we say that the two variables, movie preference and status (student/teacher), are NOT associated. Another way of thinking about this is to say that knowing if a participant is a teacher (or a student) provides no information about his or her movie preference.
What does it mean if row relative frequencies are not the same for all rows of a two-way table?
7. You can also evaluate whether two variables are associated by looking at column relative frequencies instead of row relative frequencies. A column relative frequency is a cell frequency divided by the corresponding column total. For example, the column relative frequency for the Student-Action cell is 120/160 = 0.75.
a. Calculate the other column relative frequencies and write them in the table below.
b. What do you notice about the column relative frequencies for the four columns?
c. What would you conclude about association based on the column relative frequencies?

Example 3
In the survey described in Example 2, gender for each of the participants was also recorded. Some results of the survey are given below:
• 160 participants preferred action movies
• 80 participants preferred drama movies
• 40 participants preferred science fiction movies
• 240 participants were females
• 78 female participants preferred drama movies
• 32 male participants preferred science fiction movies
• 60 female participants preferred action movies

Exercises 8–15
Use the results from Example 3 to answer the following questions. Be sure to discuss these questions with your group members.
8. Complete the two-way frequency table that summaries the data on movie preference and gender.
9. What proportion of the participants is female?
10. If there were no association between gender and movie preference, should you expect more females than males or fewer females than males to prefer action movies? Explain.
11. Make a table of row relative frequencies of each movie type for the male row and the female row. Refer to Exercises 2 through 4 to review how to complete the table below.
Suppose that you randomly pick 1 of the 400 participants. Use the table of row relative frequencies above to answer the following questions.
12. If you had to predict what type of movie this person chose, what would you predict? Explain why you made this choice.
13. If you know that the randomly selected participant is female, would you predict that her favorite type of movie was action? If not, what would you predict and why?
14. If knowing the value of one of the variables provides information about the value of the other variable, then there is an association between the two variables.
Is there an association between the variables gender and movie preference? Explain.
15. So what can be said when two variables are associated? Read the following sentences. Decide if the sentence is a correct statement based upon the survey data. If it is not correct, explain why not.
a. More females than males participated in the survey.
b. Males tend to prefer action and science fiction moves.
c. Being female causes one to prefer drama movies.

Lesson 14 Exit Ticket
A random sample of 100 eighth-grade students is asked to record two variables, whether they have a television in their bedroom and if they passed or failed their last math test. The results of the survey are summarized below.

55 students have a television in their bedroom.
35 students do not have a television in their bedroom and passed their last math test.
25 students have a television and failed their last math test.
35 students failed their last math test.

1. Complete the two-way table.
2. Calculate the row relative frequencies and enter the values in the table above. Round to the nearest thousandth.
3. Is there evidence of association between the variables? If so, does this imply there is a cause-and-effect relationship? Explain.

Try the free Mathway calculator and problem solver below to practice various math topics. Try the given examples, or type in your own problem and check your answer with the step-by-step explanations. 