In an earlier post I gave a gentle introduction to tidy evaluation in the R tidyverse using simple examples. I covered quoting with enquo
and unquoting with !!
in brief dplyr
and ggplot2
snippets. Today, I aim to build a collection of more complex use cases involving additional tools.
Those are our libraries:
libs <- c('dplyr', 'stringr', # wrangling
'knitr','kableExtra', # table styling
'ggplot2','ggforce') # plots
invisible(lapply(libs, library, character.only = TRUE))
This time, the Diamonds dataset will be our best friend in exploring the depths of tidy eval. Included in the ggplot2 package, this dataset describes the price of 54k diamonds along with their cut, weight, clarity, size, and other relevant properties. Here are the first 4 rows:
carat | cut | color | clarity | depth | table | price | x | y | z |
---|---|---|---|---|---|---|---|---|---|
0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
Meet enqous
and !!!
: The equivalent to enquo
for quoting more than one variable is called enquos
. So far, so plural. The corresponding unquoting method is !!!
- the big bang operator (remember that !!
is bang-bang). The tidyverse certainly doesn’t shy away from cosmological superlatives. (The tidyeval cheat sheet calls it bang-bang-bang, which makes more intuitive sense but is less poetic; as a trained astronomer my choice is clear.) Here we see both operators in action:
group_mean <- function(df, g, x, y){
group_cols <- enquos(x, y)
mean_col <- enquo(g)
df %>%
group_by(!!! group_cols) %>%
summarise(mean = mean(!! mean_col))
}
group_mean(diamonds, price, cut, color) %>%
head(5)
## # A tibble: 5 x 3
## # Groups: cut [1]
## cut color mean
## <ord> <ord> <dbl>
## 1 Fair D 4291.
## 2 Fair E 3682.
## 3 Fair F 3827.
## 4 Fair G 4239.
## 5 Fair H 5136.
Alternative: use ...
aka dots: Note, that if all you need to do is group together a bunch of variables (or to treat them as one group in any other way) then R offers the nifty ...
operator. You might have seen this style in function definitions or help pages already. With the dots you can capture everything that is not explicitely named and refer to it as one entity. This simplifies our above function in the following way:
group_mean <- function(df, g, ...){
mean_col <- enquo(g)
df %>%
group_by(...) %>%
summarise(mean = mean(!! mean_col))
}
group_mean(diamonds, price, cut, color) %>%
head(5)
## # A tibble: 5 x 3
## # Groups: cut [1]
## cut color mean
## <ord> <ord> <dbl>
## 1 Fair D 4291.
## 2 Fair E 3682.
## 3 Fair F 3827.
## 4 Fair G 4239.
## 5 Fair H 5136.
It’s important to note that !!!
currently doesn’t work in ggplot(aes())
. There is a workaround and hopefully soon a fix that I will cover in a future post.
The :=
operator: to rename a variable to a quoted name you need the :=
operator. Think of it as a maths-style definition if that helps you to remember the syntax. Here’s how it works, giving our mean price variable a custom name:
group_mean <- function(df, g, n, ...){
mean_col <- enquo(g)
new_name <- enquo(n)
df %>%
group_by(...) %>%
summarise(!! new_name := mean(!! mean_col))
}
group_mean(diamonds, price, mean_price, cut, color) %>%
head(5)
## # A tibble: 5 x 3
## # Groups: cut [1]
## cut color mean_price
## <ord> <ord> <dbl>
## 1 Fair D 4291.
## 2 Fair E 3682.
## 3 Fair F 3827.
## 4 Fair G 4239.
## 5 Fair H 5136.
This operator becomes more useful in complex functions or when you are writing your own packages.
Encoding strings with ensym
: In some scenarios you want to quote your input not as an expression but a symbol. In the context of helper functions this will often involve strings - and a common use case is ggplot2
wrappers. The strings can then be further manipulated for instance with the tidy stringr
package.
In this final example of the post I will showcase the use of ensym
alongside the other main tidyeval
operators. The function will be a ggplot2
convenience wrapper that build a scatter plot of two numerical features colour-coded by a categorical variable. Custom axes labels and plot title will be added. For a little extra flourish, I will add a zoom view on one particular category using the powerful facet_zoom
function from the ggforce
package. Here’s what it looks like:
plot_xy <- function(df, x, y, col, var_zoom, ...){
x <- enquo(x)
y <- enquo(y)
col <- enquo(col)
group_vars <- enquos(...)
dfname <- ensym(df) %>% str_to_sentence()
xname <- ensym(x) %>% str_to_sentence()
yname <- ensym(y) %>% str_to_sentence()
colname <- ensym(col) %>% str_to_sentence()
df %>%
mutate(!! col := as.factor(!! col)) %>%
group_by(!! col, !!! group_vars) %>%
summarise(mean_x = mean(!!x),
mean_y = mean(!!y)) %>%
ungroup() %>%
ggplot(aes(mean_x, mean_y, col = !!col)) +
geom_point() +
scale_color_brewer(type = "qual", palette = "Set1") +
labs(x = xname, y = yname, col = colname) +
ggtitle(str_c(dfname, " dataset: ",
xname, " vs ", yname,
" with colour coding by ", colname),
subtitle = str_c("Zoom view to emphasise ",
colname, " = ", var_zoom)) +
facet_zoom(x = (!! col == var_zoom))
}
plot_xy(diamonds, carat, price, clarity, "IF", color, cut)
Let’s break it down:
The
x
andy
features are encoded usingenquo
and!!
, as covered in the previous post. Those variables will form our scatter plot. But now, they are also encoded usingensym
asxname
andyname
. Those are symbols that we can now use in string functions to build custom plot titles and labels.The
col
feature is also encoded both as a quote and a symbol. This needs to be a categorical feature that we will use to colour-code the data points. The legend is the default style and position. Note, that we use:=
to preserve the column name when transforming this feature from character to factor.The
string_to_sentence
tool, from thestringr
package, simply capitalises our input strings.Additional grouping variables are encoded using
enquos
and spliced into thegroup_by
call via!!!
. By using the dots...
in the function call we give ourselves the option to use an arbitrary number of grouping features in this function.What the function does, is to group the data by the grouping variables (here: Color and Cut) plus the colour-coding feature (here: Clarity). Then it computes the group mean for the x and y features (here: Carat and Price). It plots these group means in a colour-coded scatter plot.
Finally, it zooms into one particular category of the colour-coding (here: Clarity = “IF”) and provides a magnified view. This zoom view is shown in the lower panel. The upper panel shows the entire data set. Note, that this upper panel has a darker background (and a connecting region) to indicate where the zoom view is located in the overall picture.
The zoom facet is provided by the ggforce
tool facet_zoom
which is very useful for examining specific data points. Here we only zoom into the x-axis, but it can also provide zooms on the y axis or for both axes simultaneously.
More Resources:
Rstudio’s excellent cheats sheets include a tidyeval specimen.
The prolific Rstudio Community has a tag for tidyeval questions and solutions, among many other interesting topics.