5.1 Describing one variable
5.1.1 Summary statistics
A useful first step in analyzing the distribution of scores on a single numeric variable is to calculate the relevant summary statistics. Use the summary() function for a quick, general overview. This returns the minimum, mean, and maximum scores, as well as the score at 1st, 2nd (median), and 3rd quartiles.
For specific inquiries, use the summarize() function and customize your report. For example:
library(tidyverse) # load tidyverse package for pipe syntax and summary functions
dcps %>% # start by piping in the dataset
summarize(
Avg = mean(ProfLang), # calculates the mean
StdDev = sd(ProfLang), # standard deviation
Range = max(ProfLang) - min(ProfLang)
)## # A tibble: 1 × 3
## Avg StdDev Range
## <dbl> <dbl> <dbl>
## 1 29.7 24.6 94.1
5.1.2 Graphing the distribution
In addition to collapsing the data to summary statistics, it can be informative to plot the distribution of a variable, i.e. displaying the probability of observing different values or ranges of values. Chapter 9 introduces plotting generally and Section 9.1 covers techniques for plotting distributions of a single variable.