8.4 Appending and merging data

It is often useful to combine data from different sources. This may take the form of appending (adding additional observations with information on the same variables) or merging (adding additional variables that describe the same observations).

8.4.1 Appending new observations

To append new observations to your data frame, use bind_rows(OldData,NewData). Note that the variable names need to match exactly across data frames, but the variable order does not matter.

  # old data
    myData = tribble(
      ~District, ~Students,
      115, 985,
      116, 1132
    )

  # new data to add
    new = tribble(
      ~District, ~Students,
      117, 419,
      118, 633
    )
    
  # Append new to old
    myData = bind_rows(myData,new)
    
    myData
## # A tibble: 4 × 2
##   District Students
##      <dbl>    <dbl>
## 1      115      985
## 2      116     1132
## 3      117      419
## 4      118      633

8.4.2 Merging

To merge data frames (add new variables for existing observations), use left_join(OldData,NewData). In order to link rows in one data frame to rows in another, it is critical that the data sets contain a common identifier, with the same variable name and same values. Building on the example above

  # new variables
    newvars = tribble(
      ~District,~Teachers,
      115, 43,
      116, 71,
      118, 55
    )

  # join new to old
    myData = left_join(myData,newvars)
## Joining with `by = join_by(District)`
    myData
## # A tibble: 4 × 3
##   District Students Teachers
##      <dbl>    <dbl>    <dbl>
## 1      115      985       43
## 2      116     1132       71
## 3      117      419       NA
## 4      118      633       55