6 min read

Tidy evaluation in R: Part 2 - Complex use cases (feat. facet zoom)

In an earlier post I gave a gentle introduction to tidy evaluation in the R tidyverse using simple examples. I covered quoting with enquo and unquoting with !! in brief dplyr and ggplot2 snippets. Today, I aim to build a collection of more complex use cases involving additional tools.

Those are our libraries:

libs <- c('dplyr', 'stringr',             # wrangling
          'knitr','kableExtra',           # table styling
          'ggplot2','ggforce')            # plots
invisible(lapply(libs, library, character.only = TRUE))

This time, the Diamonds dataset will be our best friend in exploring the depths of tidy eval. Included in the ggplot2 package, this dataset describes the price of 54k diamonds along with their cut, weight, clarity, size, and other relevant properties. Here are the first 4 rows:

carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63

Meet enqous and !!!: The equivalent to enquo for quoting more than one variable is called enquos. So far, so plural. The corresponding unquoting method is !!! - the big bang operator (remember that !! is bang-bang). The tidyverse certainly doesn’t shy away from cosmological superlatives. (The tidyeval cheat sheet calls it bang-bang-bang, which makes more intuitive sense but is less poetic; as a trained astronomer my choice is clear.) Here we see both operators in action:

group_mean <- function(df, g, x, y){
  
  group_cols <- enquos(x, y)
  mean_col <- enquo(g)
  df %>% 
    group_by(!!! group_cols) %>% 
    summarise(mean = mean(!! mean_col))
}

group_mean(diamonds, price, cut, color) %>% 
  head(5)
## # A tibble: 5 x 3
## # Groups:   cut [1]
##   cut   color  mean
##   <ord> <ord> <dbl>
## 1 Fair  D     4291.
## 2 Fair  E     3682.
## 3 Fair  F     3827.
## 4 Fair  G     4239.
## 5 Fair  H     5136.

Alternative: use ... aka dots: Note, that if all you need to do is group together a bunch of variables (or to treat them as one group in any other way) then R offers the nifty ... operator. You might have seen this style in function definitions or help pages already. With the dots you can capture everything that is not explicitely named and refer to it as one entity. This simplifies our above function in the following way:

group_mean <- function(df, g, ...){
  
  mean_col <- enquo(g)
  df %>% 
    group_by(...) %>% 
    summarise(mean = mean(!! mean_col))
}

group_mean(diamonds, price, cut, color) %>% 
  head(5)
## # A tibble: 5 x 3
## # Groups:   cut [1]
##   cut   color  mean
##   <ord> <ord> <dbl>
## 1 Fair  D     4291.
## 2 Fair  E     3682.
## 3 Fair  F     3827.
## 4 Fair  G     4239.
## 5 Fair  H     5136.

It’s important to note that !!! currently doesn’t work in ggplot(aes()). There is a workaround and hopefully soon a fix that I will cover in a future post.

The := operator: to rename a variable to a quoted name you need the := operator. Think of it as a maths-style definition if that helps you to remember the syntax. Here’s how it works, giving our mean price variable a custom name:

group_mean <- function(df, g, n, ...){
  
  mean_col <- enquo(g)
  new_name <- enquo(n)
  
  df %>% 
    group_by(...) %>% 
    summarise(!! new_name := mean(!! mean_col))
}

group_mean(diamonds, price, mean_price, cut, color) %>% 
  head(5)
## # A tibble: 5 x 3
## # Groups:   cut [1]
##   cut   color mean_price
##   <ord> <ord>      <dbl>
## 1 Fair  D          4291.
## 2 Fair  E          3682.
## 3 Fair  F          3827.
## 4 Fair  G          4239.
## 5 Fair  H          5136.

This operator becomes more useful in complex functions or when you are writing your own packages.

Encoding strings with ensym: In some scenarios you want to quote your input not as an expression but a symbol. In the context of helper functions this will often involve strings - and a common use case is ggplot2 wrappers. The strings can then be further manipulated for instance with the tidy stringr package.

In this final example of the post I will showcase the use of ensym alongside the other main tidyeval operators. The function will be a ggplot2 convenience wrapper that build a scatter plot of two numerical features colour-coded by a categorical variable. Custom axes labels and plot title will be added. For a little extra flourish, I will add a zoom view on one particular category using the powerful facet_zoom function from the ggforce package. Here’s what it looks like:

plot_xy <- function(df, x, y, col, var_zoom, ...){
  
  x <- enquo(x)
  y <- enquo(y)
  col <- enquo(col)
  group_vars <- enquos(...)
  
  dfname <- ensym(df) %>% str_to_sentence()
  xname <- ensym(x) %>% str_to_sentence()
  yname <- ensym(y) %>% str_to_sentence()
  colname <- ensym(col) %>% str_to_sentence()
  
  df %>% 
    mutate(!! col := as.factor(!! col)) %>% 
    group_by(!! col, !!! group_vars) %>% 
    summarise(mean_x = mean(!!x),
              mean_y = mean(!!y)) %>% 
    ungroup() %>% 
    ggplot(aes(mean_x, mean_y, col = !!col)) +
    geom_point() +
    scale_color_brewer(type = "qual", palette = "Set1") +
    labs(x = xname, y = yname, col = colname) +
    ggtitle(str_c(dfname, " dataset: ",
                  xname, " vs ", yname,
                  " with colour coding by ", colname),
            subtitle = str_c("Zoom view to emphasise ",
                             colname, " = ", var_zoom)) +
    facet_zoom(x = (!! col == var_zoom))
}

plot_xy(diamonds, carat, price, clarity, "IF", color, cut)

Let’s break it down:

  • The x and y features are encoded using enquo and !!, as covered in the previous post. Those variables will form our scatter plot. But now, they are also encoded using ensym as xname and yname. Those are symbols that we can now use in string functions to build custom plot titles and labels.

  • The col feature is also encoded both as a quote and a symbol. This needs to be a categorical feature that we will use to colour-code the data points. The legend is the default style and position. Note, that we use := to preserve the column name when transforming this feature from character to factor.

  • The string_to_sentence tool, from the stringr package, simply capitalises our input strings.

  • Additional grouping variables are encoded using enquos and spliced into the group_by call via !!!. By using the dots ... in the function call we give ourselves the option to use an arbitrary number of grouping features in this function.

  • What the function does, is to group the data by the grouping variables (here: Color and Cut) plus the colour-coding feature (here: Clarity). Then it computes the group mean for the x and y features (here: Carat and Price). It plots these group means in a colour-coded scatter plot.

  • Finally, it zooms into one particular category of the colour-coding (here: Clarity = “IF”) and provides a magnified view. This zoom view is shown in the lower panel. The upper panel shows the entire data set. Note, that this upper panel has a darker background (and a connecting region) to indicate where the zoom view is located in the overall picture.

The zoom facet is provided by the ggforce tool facet_zoom which is very useful for examining specific data points. Here we only zoom into the x-axis, but it can also provide zooms on the y axis or for both axes simultaneously.

More Resources: