9.2 Visualizing relationships
Considering relationships between variables with ggplot() generally involves adding aesthetics. We might be interested in comparing the same histogram or box plot for different groupings within the data. We can do that easily by specifying the grouping variable as the aesthetic fill. For example:
To produce a scatter plot, we need to specify the aesthetic y, i.e. name the variable that will appear on the vertical axis, and set the plot type to geom_point().
We could also visually distinguish different school types in our scatter plot. However, for certain plot types like geom_point() and geom_line(), we use the aesthetic colour rather than fill as above.
# Scatter plot by categorical variable
ggplot(data=dcps,aes(x=ProfMath,y=ProfLang,colour=SchType)) +
geom_point()Different plot types can be easily superimposed by adding additional commands. One common example would be adding a univariate linear regression (“best fit” line) to a scatter plot. That requires the plot type geom_smooth() which takes two arguments. First, you must specify the “smoothing method”, i.e. the mathematical approach to summarizing the relationship between your X and Y variables. For a “best fit” line, use the argument method='lm'. Then, R expects a mathematical equation relating X and Y. To get a “best fit” line, use the argument formula = y ~ x. Unless you fully understand the standard error calculations of the underlying stat_smooth() function, it’s best to suppress the plotting of standard errors by including se=F.
# Scatter plot with "best fit" line
ggplot(data=dcps,aes(x=ProfMath,y=ProfLang)) +
geom_point() +
geom_smooth(method='lm',formula=y~x,se=F)