Unpacking the Tidyverse - ggplot2

Introduction

This is the first of eight installments of my Unpacking the Tidyverse series. Each installment focuses on one of the eight core packages in Hadley Wickham’s tidyverse. Instructions given in each post are mainly derived from Hadley’s textbook, R for Data Science, and CRAN package documentation. This installment of Unpacking the Tidyverse focuses on the graphing package, ggplot2. The next installment focuses on the readr package and can be found here.

The ggplot2 package is a powerful data visualization tool based on the grammar of graphics: an idea that any plot can be described using seven distinct parameters. In this blog post, I run through these parameters attempting summarize them in an easy-to-access resource for some of the most common tasks in data visualization.

library('ggplot2')       
library('RColorBrewer')  
library('viridisLite')  

Template

ggplot( ***dataset*** ) +            
geom_***geometry***(              
  mapping = aes( ***variables*** ),    
  stat = ***statistical transformation***,          
  position = ***position adjustment***        
) + 
coord_***coordinate system****() +          
facet_***facet declaration***()              

Not all parameters must be defined for every ggplot call, but by defining the dataset, plot geometry, variables, statistical transformations, position adjustments, coordinate system and facets, you can create almost any plot with ggplot2. Of course, there are additional functions that affect the aesthetics of the plot, such as annotations, titles, axes, and legends. Let’s briefly cover all seven of these parameters, discussing the commands that I most commonly find useful for each of them.

Data

ggplot(mpg)

Just calling ggplot() with an associated dataset will create a canvas for your graphic, not an actual plot. This is because R doesn’t know what variables you want plotted from that dataset, or how you want them plotted. To create a basic plot, you’ll have to provide information on the plot’s geometry.

Geometries

ggplot(mpg) +
  geom_point(mapping = aes(x = displ, y = hwy, color = cyl))

Plot geometries are the most important parameter, and the only parameter other than ggplot() that must be defined in every single visualization. There are hundreds of geometries available for your use, courtesy of the R community; check out http://www.ggplot2-exts.org/ for more information. Here’s a list of the handful of geometries, mappings, and aesthetics that I use most often.

Geometries

geom_density( ) - 1 variable Gaussian distribution
geom_point( ) - 2, 3, or 4 variable scatterplot
geom_smooth( ) - 2 or 3 variable line plot
geom_bar(stat=”identity”) - 2 or 3 variable bar plot
geom_hex( ) - 2 variable distribution
geom_tile( ) - 3 variable tile plot
geom_map( ) - 2, 3, or 4 variable geospatial

Mappings

aes( x = ) - x-axis location
aes( y = ) - y-axis location
aes( …, color = ) - color dependent on variable’s value
aes( …, linetype = ) - line type dependent on variable’s value
aes( …, size = ) - size of object dependent on variable’s value

Aestetics

size = size of a point, declared by an integer
linetype = 0 (blank), 1 (solid), 1 (dashed), 3 (dotted), 4 (dot-dash), 5 (long-dash), 6 (two-dash)
weight = (line thickness) integer
color = (outline) string
fill = (inside) string
alpha = (transparency) double from 0 to 1

Statistical Transformation

Statistical transformations are most useful with geom_bar. Each geometry has an associated transformation, and most of the time this transformation is useful and should not be changed. The default transformation associated with geom_bar is "count", or the total number of occurrences of each variable. Sometimes you want a bar chart to visualize something else, like the unique occurrences of each variable, or a different variable all together. These three transformations allow you control over bar charts to visualize your data in entirely new ways.

geom_bar(aes(…), stat = “count”) - visualize the number of entries in a variable, n()
geom_bar(aes(…), stat = “identity”) - visualize the variable, not the count
geom_bar(aes(…), stat = “unique”) - visualize only unique components of the variable

Position Adjustment

When there are multiple data points occupying the same space or are stacked too closely together, position adjustments allow for a clearer visualization. Most often used with bar charts and scatter plots, these are the three position adjustments I find to be the most useful.

geom_point(aes(…), position = “jitter”) - adds a small amount of noise to better view overlapping points
geom_bar(aes(…), position = “dodge”) - arranges bar elements side-by-side
geom_bar(aes(…), position = “fill”) - arranges bar elements on top of each other, normalizing height

Coordinate Systems

ggplot(mpg) +
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  coord_flip()  

Cartesian coordinates are the go-to most visualizations, but the coord_ function does more than call different coordinate systems. Often, it’s useful to fix the x-y ratio, or flip the axis of bar charts. When you’re working with exponential data, you may want to transform an axis to a logarithmic system. These useful functions and many more can be done by the coord_ function.

coord_fixed(ratio, xlim, ylim) - fixing the aspect ratio between x and y
coord_flip(xlim, wlim) - flipping the axes
coord_polar(theta, start, direction) - converting Cartesian to polar coordinates
coord_trans(xtrans, ytrans, limx, limy) - transform Cartesian coordinates, use with log functions
coord_map(projection, orientation, xlim, ylim) - mapproj package projections

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

I don’t use facets that often, but if you’re a fan multiple miniature plots, you might find these faceting options useful.

facet_grid(x ~ y, labeller, scales) - facets the display into rows and columns based on the two given variables (“.”” placeholder)
facet_wrap(~ x, labeller, scales, nrow, ncol) - wraps facets into a rectangular layout

Labels

ggplot(mpg) + 
  labs(
    title = "title", 
    subtitle = "subtitle",
    caption = "caption",
    xlab = "x-label", 
    ylab = "y-label"
  )

Labels are helpful for properly communicating your visualization. Including these five labels with your plots can help clarify your topic, variables, and sources.

Annotations

Annotations can be helpful to point out specific properties of a visualization. I think annotating multiple individual observations is a mistake, it clutters the graphic and takes away from the effectiveness of the visualization. Packages like plotly and shiny do a better job of creating interactive graphics if you want individual labels. With that in mind, here are some annotations that are easy to use in ggplot2.

geom_text(aes(label = “text here”), vjust = , hjust = ) - places a summary annotation within the graphic, according to vjust and hjust
geom_hline(yintercept = , size = , color = , linetype) - creates a horizontal reference line through the plot according to yintercept
geom_vline(xintercept = , size =, color =, linetype =) - creates a vertical reference line through the plot according to xintercept
geom_rect(aes(xmin = , xmax = , ymin = , ymax = )) - creates a rectangle to box in and highlight interesting data
geom_segment(aes(x = , y = , xend = , yend = ,) arrow = ) - creates a line segment within the plot, it can be an arrow

Scales

Scales are automatically set when you create a ggplot, but sometimes it’s useful to alter the color scheme, legend, or axes. The naming scheme for these functions is scale_ followed by the name of the aesthetic then another _, and then the name of the scale.

Aestetic Names

_x_
_y_
_alpha_
_color_
_fill_
_linetype_
_shape_
_size_

Scale Names

_discrete - maps discrete variables to visual values
_continous - maps continuous variables to visual values
_identity - uses data values as visual values
_manual - values = c( ), maps discrete values to manually chose visual values
_gradient - creates a two-color gradient, low - high
_gradient2 - creates a diverging two color gradients, positive - negative
_gray -creates a gradient in gray
_brewer - brewer( palette = “ “), calls RColorBrewer palettes

Axes

Scales can also be used to alter axes in ggplot2, these are some commented examples for fixing common behavioral issues you may have with your axes.

scale_x_continuous(breaks = seq(0, 10, by = 1), labels = c(1:9, "ten")) 
# define breaks by sequence, define labels of breaks

scale_x_discrete(breaks=c("ctrl", "trt1", "trt2"),            # defined tick marks
                 labels=c("Control", "Treat 1", "Treat 2"))   # new names
                 
expand_limits(x = c(0,8), y = c(0,8))                         # expand the limits of the graph to visualize specific values

scale_y_reverse()     # Reverse y-axis direction (zero on top)

element_blank()       # Hiding axis labels 

theme(axis.title.x = element_text(face="bold", color="red", size=20),  # change axis labels, font and style
      axis.text.x = element_text(angle=90, vjust=0.5, size=16))        # change tick mark label, font, style

theme(panel.grid.major=element_blank(),  # hiding major gridlines
      panel.grid.minor=element_blank())  # hiding minor gridlines

library(scales)   # changing the y-axis to log 10
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x)))

Legends

Legends can be a bit harder to wrangle than axes. Here are examples to several common problems I have with legends.

theme(legend.position = "left")    # legend to the left of plot
theme(legend.position = "top")     # legend above plot
theme(legend.position = "bottom")  # legend below plot
theme(legend.position = "right")   # the default
theme(legend.position = "none")    # no legend will be generated

geom_point(aes(...), show.legend=FALSE) # don't include this geom in the legend    

theme(legend.title = element_blank()) # hides the legends title

# changing the order of items in your legend, scale_ function is dependent upon the type of plot
scale_fill_discrete(breaks=(c("item1", "item2", "item3")))

# simply reverse the current item display in the legend
scale_fill_discrete(guide = guide_legend(reverse=TRUE))

# manually alter color, break, label, and name of legend items.
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"), 
                  name="Experimental\nCondition",
                  breaks=c("ctrl", "trt1", "trt2"),
                  labels=c("Control", "Treatment 1", "Treatment 2"))

Colors

Some of my favorite preset colors include

thistle
slateblue2
orchid
midnightblue
deepskyblue
lightslateblue

The RColorBrewer package also has some great color palettes.

RColorBrewer::display.brewer.all()

The viridisLite Package has contains 5 color scales that are color-blind and gray-scale friendly.

Themes

Sometimes you don’t want to go through the effort of creating your own theme, so ggplot2 has some built in. Here are four basic themes that are excellent for exploratory analysis or creating a solid foundation to build off.

theme_bw() - A while background with gridlines
theme_grey() - the default theme
theme_classic() - A white background with no gridlines
theme_minimal() - mystery theme, check it out!

Final Thoughts

This is just a small taste of what ggplot2 can do, but with this information you should be able to solve 80% of your graphing needs. Again, the references listed at the beginning of this write up are excellent and worth checking out. If you want to learn more about R for Data Science written by Hadley Wickham, it’s free to read at http://r4ds.had.co.nz/. If you don’t want to read through the entire book, check out the other posts in my R for Data Science Summary Series.

For more useful resources, including advanced ggplot2 techniques, check out these links:

Thanks for reading,
- Fisher

R R4DS tidyverse tutorial data visualization ggplot2

Unpacking the Tidyverse - ggplot2

Introduction

Template

Data

Geometries

Geometries

Mappings

Aestetics

Statistical Transformation

Position Adjustment

Coordinate Systems

Facets

Labels

Annotations

Scales

Aestetic Names

Scale Names

Axes

Legends

Colors

Themes

Final Thoughts

Comments

Introduction

Template

Data

Geometries

Geometries

Mappings

Aestetics

Statistical Transformation

Position Adjustment

Coordinate Systems

Facets

Labels

Annotations

Scales

Aestetic Names

Scale Names

Axes

Legends

Colors

Themes

Final Thoughts

Comments

Related Posts

The Stringr Package 16 Jun 2019

The Lubridate Package 16 Jun 2019

The Forcats Package 16 Jun 2019