6.2 Cross-tabulation
We can examine the relationship between two categorical variables with a simple extension of the code from Section 6.1. In the example below, we see if the gender of the subject varies (SubjectSex) by the period of the film (Period). To do so, we simply add the variable Period to the group_by() function that we used in the previous Section:
# Frequency table
XTab <- film %>% # create new object 'XTab' starting with object 'film'
group_by(SubjectSex,Period) %>% # separately for every combination of values
# of 'SubjectSex' and 'Period'
summarize(Count=n()) # count the number of observations## `summarise()` has grouped output by 'SubjectSex'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 3
## # Groups: SubjectSex [2]
## SubjectSex Period Count
## <chr> <chr> <int>
## 1 Female 1915--1965 44
## 2 Female 1965--1999 59
## 3 Female 2000--2014 74
## 4 Male 1915--1965 132
## 5 Male 1965--1999 203
## 6 Male 2000--2014 249
In this table, the comparison we want to make, whether the gender focus of biopics has changed over time, is somewhat obscured by the change in the total number of biopics over time. We can again adapt code from Section 6.1 to calculate the percentage of films with a subject of each gender separately for each period.
# Frequency table with percentages by Period
XTabPct <- XTab %>% # create new object 'XTabPct' starting with object 'XTab'
group_by(Period) %>% # separately for every value of 'Period'
mutate(PercentPeriod = Count/sum(Count)) # create new variable
# 'PercentPeriod' and set it equal to the value of Count in each row
# divided by the value of Count summed across all rows (within a value of
# 'Period'), i.e. relative frequency by period.
XTabPct # print object 'XTabPct' to the console## # A tibble: 6 × 4
## # Groups: Period [3]
## SubjectSex Period Count PercentPeriod
## <chr> <chr> <int> <dbl>
## 1 Female 1915--1965 44 0.25
## 2 Female 1965--1999 59 0.225
## 3 Female 2000--2014 74 0.229
## 4 Male 1915--1965 132 0.75
## 5 Male 1965--1999 203 0.775
## 6 Male 2000--2014 249 0.771
We see that the fraction of biopics with female subjects is relatively constant over time, between 22.5 and 25 percent. The function tabyl() in the janitor package is a simple but powerful alternative for producing customized frequency tables.