7.4 Logistic regression

With a binary outcome measure, logistic regression is generally more appropriate than linear (OLS) regression. Use the glm() function to estimate a generalized model, and specify the model family as binomial within the arguments.

# create binary measure of "above average math proficiency"
  dcps <- dcps %>%
    mutate(AboveAvgMath = if_else(ProfMath > mean(ProfMath),1,0))

  Model3 <-
    glm(
      AboveAvgMath ~ ProfLang + NumTested,  # specify model
      family = 'binomial',  # logistic estimation
      data = dcps
    )

To view the coefficient estimates and evaluate hypotheses, again apply the summary() function to the model object.

# View estimates
  summary(Model3)  
## 
## Call:
## glm(formula = AboveAvgMath ~ ProfLang + NumTested, family = "binomial", 
##     data = dcps)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.224266   0.612526  -5.264 1.41e-07 ***
## ProfLang     0.113658   0.024119   4.712 2.45e-06 ***
## NumTested   -0.002393   0.002292  -1.044    0.296    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 144.342  on 107  degrees of freedom
## Residual deviance:  77.127  on 105  degrees of freedom
## AIC: 83.127
## 
## Number of Fisher Scoring iterations: 6
# Odds ratios
  exp(coef(Model3))
## (Intercept)    ProfLang   NumTested 
##  0.03978499  1.12036941  0.99760944

The results indicate that a percentage-point increase in a school’s language proficiency is expected to raise the odds of being above average in math by 12%, conditional on the number of students tested. Again, the increase is significant (\(p < 0.001\)).