6.2 Cross-tabulation

We can examine the relationship between two categorical variables with a simple extension of the code from Section 6.1. In the example below, we see if the gender of the subject varies (SubjectSex) by the period of the film (Period). To do so, we simply add the variable Period to the group_by() function that we used in the previous Section:

# Frequency table
  XTab <- film %>% # create new object 'XTab' starting with object 'film'
    group_by(SubjectSex,Period) %>% # separately for every combination of values
                                    # of 'SubjectSex' and 'Period'
    summarize(Count=n()) # count the number of observations
## `summarise()` has grouped output by 'SubjectSex'. You can override using the
## `.groups` argument.
  XTab # print object 'XTab' to the console
## # A tibble: 6 × 3
## # Groups:   SubjectSex [2]
##   SubjectSex Period     Count
##   <chr>      <chr>      <int>
## 1 Female     1915--1965    44
## 2 Female     1965--1999    59
## 3 Female     2000--2014    74
## 4 Male       1915--1965   132
## 5 Male       1965--1999   203
## 6 Male       2000--2014   249

In this table, the comparison we want to make, whether the gender focus of biopics has changed over time, is somewhat obscured by the change in the total number of biopics over time. We can again adapt code from Section 6.1 to calculate the percentage of films with a subject of each gender separately for each period.

# Frequency table with percentages by Period
  XTabPct <- XTab %>% # create new object 'XTabPct' starting with object 'XTab'
    group_by(Period) %>% # separately for every value of 'Period'
    mutate(PercentPeriod = Count/sum(Count)) # create new variable
      # 'PercentPeriod' and set it equal to the value of Count in each row 
      # divided by the value of Count summed across all rows (within a value of 
      # 'Period'), i.e. relative frequency by period.

  XTabPct # print object 'XTabPct' to the console
## # A tibble: 6 × 4
## # Groups:   Period [3]
##   SubjectSex Period     Count PercentPeriod
##   <chr>      <chr>      <int>         <dbl>
## 1 Female     1915--1965    44         0.25 
## 2 Female     1965--1999    59         0.225
## 3 Female     2000--2014    74         0.229
## 4 Male       1915--1965   132         0.75 
## 5 Male       1965--1999   203         0.775
## 6 Male       2000--2014   249         0.771

We see that the fraction of biopics with female subjects is relatively constant over time, between 22.5 and 25 percent. The function tabyl() in the janitor package is a simple but powerful alternative for producing customized frequency tables.