7.2 OLS regression

Estimating a regression model using OLS is simple in R. Using the lm() function will estimate a model with one or more independent variables. Simply specify the formula using the syntax: Y ~ X1 + X2.

# Bivariate (unconditional) estimate
  Model1 <- lm(ProfMath ~ ProfLang, data = dcps)

# Multivariate (conditional) estimate
  Model2 <- lm(ProfMath ~ ProfLang + NumTested, data = dcps)

To view the coefficient estimates and evaluate hypotheses about the relationship, apply the summary() function to the model object.

  summary(Model1)

## 
## Call:
## lm(formula = ProfMath ~ ProfLang, data = dcps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.232  -5.147  -0.910   7.169  26.919 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.90964    1.50490   0.604    0.547    
## ProfLang     0.87614    0.03906  22.429   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.945 on 106 degrees of freedom
## Multiple R-squared:  0.826,  Adjusted R-squared:  0.8243 
## F-statistic: 503.1 on 1 and 106 DF,  p-value: < 2.2e-16

  summary(Model2)

## 
## Call:
## lm(formula = ProfMath ~ ProfLang + NumTested, data = dcps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.329  -5.407  -0.804   6.975  26.433 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.210855   1.714578   1.289    0.200    
## ProfLang     0.894302   0.040536  22.062   <2e-16 ***
## NumTested   -0.010223   0.006595  -1.550    0.124    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.88 on 105 degrees of freedom
## Multiple R-squared:  0.8299, Adjusted R-squared:  0.8266 
## F-statistic: 256.1 on 2 and 105 DF,  p-value: < 2.2e-16

Notice in each that the independent variables define the rows. In Model2, the estimated slope coefficient for ProfLang is 0.89 with a p-value less than 0.001. This means that on average and net of the number of students tested, a 1-percentage-point increase in language proficiency is associated with a 0.89-percentage-point increase in math proficiency. The association is statistically significant (\(p<0.001\)). We might also note that the variables in the model account for almost 90% of observed variation in math proficiency across DC Public Schools (\(Adj~R^2=0.83\)).