4.3 Referencing variables of a data frame

A data frame (or tibble) is a two-dimensional structure in which each variable forms a column and each observation forms a row. Each element is the value that a given observation takes on for a particular variable.

How do you reference or identify the variables in a data frame (e.g. to calculate the mean number of students tested, NumTested from the dcps schools data)? The $ and [[ extraction operators both pull variables from a dataframe or items from a list. Note that $ requires the variable name, whereas [[ allows you to use either the variable’s name (in quotes) or column number in the data frame. Do what you want. I prefer [[ for anyone in more advanced programming, but I’ll typically use $ in this course.

# Extract the variable (all observations)
  dcps$NumTested
##   [1]   72  147   67  180  213  224  158  112  157  375  222  149  167   67   83
##  [16]   87   96  121  289  109  495   62 1423  146  117  156  109  189  212  159
##  [31]  112   77  137  112  362  310  110  160   94   91  178  334  290  235  354
##  [46]  114  156  137  115  181  306  111   96  193  246  132   12  129   93  153
##  [61]  143  140  212  143  105  148  258  129   66  144  399  133  117   79  144
##  [76]  213  120  288   96  170   20   61  121  261  135  102  113  134  121   73
##  [91]  217  211  175  409  239  126  102  374  199  180  173  190   23  253  172
## [106]  154  152  421
# Mean on a variable (3 ways)
  mean(dcps$NumTested) # object$variable
## [1] 180.1019
  mean(dcps[['NumTested']]) # object[['variable']]
## [1] 180.1019
  mean(dcps[[4]]) # object[[column#]]
## [1] 180.1019