6.1 Describing one variable

To calculate the frequency of different values of a categorical variable, we adapt the combination of group_by() and summarize() from Section 5.3.1. In this example, we count the number of biopics that featured women vs. men, as captured by the variable SubjectSex. First, we use the command group_by(SubjectSex) to instruct R that all subsequent commands should be evaluated separately for different values of SubjectSex. Then we use summarize() to create a custom set of summary statistics. Finally, we use the function n() to count observations, and we set the result equal to a new variable Count.

# Frequency table
  Tab <- film %>% # create new object 'Tab' starting with object 'film'
    group_by(SubjectSex) %>% # separately for each value of SubjectSex
    summarize(Count=n()) # count the number of observations
  
  Tab # print object 'Tab' to the console
## # A tibble: 2 × 2
##   SubjectSex Count
##   <chr>      <int>
## 1 Female       177
## 2 Male         584

Calculating the the fraction of total cases in each category (relative frequency) requires an extra line of code (mutate(Percent = Count/sum(Count))). The mutate() function creates a new variable (see Section 8.3 for details). The argument starts with the name of the new variable (Percent) and a definition of the new variable. In this case, we set Percent equal to the value of Count in each row divided by the sum of the values of Count across all rows.

# Relative frequency
  Tab_pct <- Tab %>% # create new object 'Tab_pct' starting with object 'Tab'
    mutate(Percent = Count/sum(Count)) # create new variable 'Percent'
      # and set it equal to the value of Count in each row divided by
      # the value of Count summed across all rows, i.e. relative frequency
  
  Tab_pct # print object 'Tab_pct' to the console
## # A tibble: 2 × 3
##   SubjectSex Count Percent
##   <chr>      <int>   <dbl>
## 1 Female       177   0.233
## 2 Male         584   0.767

Frequency tables can also be visualized as bar charts. See Chapter 9 for plotting generally and Section 9.1 for examples of a bar chart. Frequency tables and bar charts are two different ways of representing the distribution of a categorical variable.