5.3 Group comparisons
5.3.1 Summary statistics by group
Comparing outcomes across groups can reveal important patterns. We conduct such comparisons using the group_by() function, here in combination with summarize(). The group_by() function takes as arguments a comma separated list variables. R will then evaluate all piped commands separately for every combination of values that appear in your listed variables. Here’s an example that calculates the mean and standard deviation of math proficiency (ProfMath) separately for elementary, middle, and high school (SchType).
dcps %>%
group_by(SchType) %>% # separately for each value of SchType
summarize(
Avg = mean(ProfMath), # calculate mean of ProfMath
StDev = sd(ProfMath) # calculate SD of ProfMath
)## # A tibble: 3 × 3
## SchType Avg StDev
## <fct> <dbl> <dbl>
## 1 Elementary 34.0 23.7
## 2 Middle 19.6 17.6
## 3 High 12.9 22.5
Note: the arguments for group_by() should be categorical variables and not continuous variables. Why? What happens when you try using a continuous variable?