7.2 OLS regression
Estimating a regression model using OLS is simple in R. Using the lm() function will estimate a model with one or more independent variables. Simply specify the formula using the syntax: Y ~ X1 + X2.
# Bivariate (unconditional) estimate
Model1 <- lm(ProfMath ~ ProfLang, data = dcps)
# Multivariate (conditional) estimate
Model2 <- lm(ProfMath ~ ProfLang + NumTested, data = dcps)To view the coefficient estimates and evaluate hypotheses about the relationship, apply the summary() function to the model object.
##
## Call:
## lm(formula = ProfMath ~ ProfLang, data = dcps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.232 -5.147 -0.910 7.169 26.919
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.90964 1.50490 0.604 0.547
## ProfLang 0.87614 0.03906 22.429 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.945 on 106 degrees of freedom
## Multiple R-squared: 0.826, Adjusted R-squared: 0.8243
## F-statistic: 503.1 on 1 and 106 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = ProfMath ~ ProfLang + NumTested, data = dcps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.329 -5.407 -0.804 6.975 26.433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.210855 1.714578 1.289 0.200
## ProfLang 0.894302 0.040536 22.062 <2e-16 ***
## NumTested -0.010223 0.006595 -1.550 0.124
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.88 on 105 degrees of freedom
## Multiple R-squared: 0.8299, Adjusted R-squared: 0.8266
## F-statistic: 256.1 on 2 and 105 DF, p-value: < 2.2e-16
Notice in each that the independent variables define the rows. In Model2, the estimated slope coefficient for ProfLang is 0.89 with a p-value less than 0.001. This means that on average and net of the number of students tested, a 1-percentage-point increase in language proficiency is associated with a 0.89-percentage-point increase in math proficiency. The association is statistically significant (\(p<0.001\)). We might also note that the variables in the model account for almost 90% of observed variation in math proficiency across DC Public Schools (\(Adj~R^2=0.83\)).