My Kaggle Hidden Gems Notebooks competition officially concluded on May 17th. A number of great notebooks were being submitted, and I wanted to design a nice leaderboard to announce the final scores and winners. When I discovered the new gtExtras package, I instantly knew that it would be the perfect tool for this task. Let me show you how.
The goal of the Hidden Gems competition was to build insightful and engaging notebooks, in R or Python, exploring aspect of my Hidden Gems dataset on Kaggle, which chronicles 100 episodes of me highlighting underrated notebooks published by Kagglers. The competition ran from April 5th to May 17th, and the contributed notebooks were evaluated by a stellar panel of Kaggle community judges on five key criteria that I have outlined in a previous post.
The gtExtras
package was written by Tom Mock, and is a high-level wrapper around the fantastic gt package to create beautiful tables. I have used gt
extensively since I’ve discovered it about a year ago. It is now my go-to package for presenting data in tables. Alongside gt
and gtExtras
we will need only a few basic tidyverse libraries:
libs <- c('dplyr', 'tibble', # wrangling
'tidyr', 'stringr', # wrangling
'readr', # input
'gt', 'gtExtras') # tables
invisible(lapply(libs, library, character.only = TRUE))
The data for this leaderboard was aggregated from the scores of the individual judges. Those detailed scores will remain confidential, but in their aggregated form they constitute the public leaderboard. Here I’m reading in the aggregated table:
scores_total <- read_csv("../../static/files/gems_comp_scores_total.csv", col_types = cols())
A small set of helper variables defines some Kaggle-specific colours. If you’re active on the platform then those should look familiar to you. I like to use those colours for styling in Kaggle-related analyses; such as my own Hidden Gems EDA notebook:
kaggle_blue <- "#008abc"
kaggle_gold <- "#fed348"
kaggle_silver <- "#e9e9e9"
kaggle_bronze <- "#f0ba7c"
This is the setup for now. I will explain another preparation aspect along the way. For now, let’s start at the end. This is the full code to produce our final table:
avatars <- tibble(
avatar = fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
) %>%
separate(avatar, sep = "/", into = c("a", "b", "c", "d", "e", "user"), remove = FALSE) %>%
mutate(user = str_remove(user, ".jpg")) %>%
mutate(user = str_remove(user, ".png")) %>%
select(-a, -b, -c, -d, -e)
scores_total %>%
rownames_to_column(var = "rank") %>%
mutate(perc = total/(7*5*5) * 100) %>%
left_join(avatars, by = "user") %>%
select(rank, avatar, author, total, perc) %>%
gt() %>%
gt_img_rows(avatar, img_source = "local") %>%
gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>%
gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>%
gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>%
tab_header(
title = md("**Kaggle Hidden Gems Competition - Final Results**"),
subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
) %>%
tab_style(
style = cell_text(size = px(18), weight = "bold", style = "italic"),
locations = cells_body(
columns = c(author)
)
) %>%
tab_style(
style = cell_text(size = px(18), weight = "bold"),
locations = cells_body(
columns = c(total, rank)
)
) %>%
cols_label(
rank = "Rank",
avatar = "",
author = "Notebook Author",
total = "Final Score",
perc = "(vs max possible score)"
) %>%
opt_row_striping()
Kaggle Hidden Gems Competition - Final Results | ||||
---|---|---|---|---|
A 2022 Notebooks Competition scored by Community Judges | ||||
Rank | Notebook Author | Final Score | (vs max possible score) | |
1 | Leonie | 153.0 | ||
2 | Michal Bogacz | 149.5 | ||
3 | Bukun | 139.5 | ||
4 | Schubert de Abreu | 132.0 | ||
5 | Abir ELTAIEF | 102.0 | ||
6 | Karthik | 81.0 | ||
7 | Tanya Dayanand | 77.5 | ||
8 | GSD | 64.0 | ||
9 | sravya yellapragada | 55.0 |
I hope you like this table as much as I do. gtExtras
helped a lot in building this with relatively few lines of code. I was very impressed. The table itself lists the final rank of the author, alongside their name, profile picture, and score. The score is expressed in absolute points and as a bar chart relative to the maximum possible points. For the top 3 finishers, there is a gold - silver - bronze shading using Kaggle’s colours. Now let’s find out how we get there.
We start with the raw input table in standard gt
formatting:
scores_total %>%
select(author, total) %>%
gt()
author | total |
---|---|
Leonie | 153.0 |
Michal Bogacz | 149.5 |
Bukun | 139.5 |
Schubert de Abreu | 132.0 |
Abir ELTAIEF | 102.0 |
Karthik | 81.0 |
Tanya Dayanand | 77.5 |
GSD | 64.0 |
sravya yellapragada | 55.0 |
Arguably, this is a clean but rather plain table. Scroll up to see the contrast to the final table. In our next step, we’ll add some basic gt
styling in the shape of title & subtitle, column names, and row striping. We also add the final rank, which is already implicit in the table, as a new rank
column:
scores_total %>%
select(author, total) %>%
rownames_to_column(var = "rank") %>%
gt() %>%
tab_header(
title = md("**Kaggle Hidden Gems Competition - Final Results**"),
subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
) %>%
cols_label(
rank = "Rank",
author = "Notebook Author",
total = "Final Score"
) %>%
opt_row_striping()
Kaggle Hidden Gems Competition - Final Results | ||
---|---|---|
A Notebooks Competition scored by Community Judges | ||
Rank | Notebook Author | Final Score |
1 | Leonie | 153.0 |
2 | Michal Bogacz | 149.5 |
3 | Bukun | 139.5 |
4 | Schubert de Abreu | 132.0 |
5 | Abir ELTAIEF | 102.0 |
6 | Karthik | 81.0 |
7 | Tanya Dayanand | 77.5 |
8 | GSD | 64.0 |
9 | sravya yellapragada | 55.0 |
So far, so good. This is normally the extent to which I’m formatting tables that I’m using in basic EDA projects. But this is a special occasion, so let’s go the extra mile here. First: colour. The gold-silver-bronze shades are added through the gt_highlight_rows function. You can see that the parameters are pretty straightforward here: we specify the row(s), the fill colour, and also have control over the font:
scores_total %>%
select(author, total) %>%
rownames_to_column(var = "rank") %>%
gt() %>%
gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>%
gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>%
gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>%
tab_header(
title = md("**Kaggle Hidden Gems Competition - Final Results**"),
subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
) %>%
cols_label(
rank = "Rank",
author = "Notebook Author",
total = "Final Score"
) %>%
opt_row_striping()
Kaggle Hidden Gems Competition - Final Results | ||
---|---|---|
A Notebooks Competition scored by Community Judges | ||
Rank | Notebook Author | Final Score |
1 | Leonie | 153.0 |
2 | Michal Bogacz | 149.5 |
3 | Bukun | 139.5 |
4 | Schubert de Abreu | 132.0 |
5 | Abir ELTAIEF | 102.0 |
6 | Karthik | 81.0 |
7 | Tanya Dayanand | 77.5 |
8 | GSD | 64.0 |
9 | sravya yellapragada | 55.0 |
Next, we want to add the percentage of the score vs maximum possible score as a horizontal bar. This creates a nice visual contrast to the text and numbers in the table. The scoring system had a maximum of 5 points in 5 categories from 7 judges, for a total of 175 points. First, we compute the percentage with mutate
, as usual. The we use gt_plt_bar_pct to the percentage as a bar visual:
scores_total %>%
select(author, total) %>%
rownames_to_column(var = "rank") %>%
mutate(perc = total/(7*5*5) * 100) %>%
gt() %>%
gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>%
gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>%
gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>%
tab_header(
title = md("**Kaggle Hidden Gems Competition - Final Results**"),
subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
) %>%
cols_label(
rank = "Rank",
author = "Notebook Author",
total = "Final Score",
perc = "(vs max possible score)"
) %>%
opt_row_striping()
Kaggle Hidden Gems Competition - Final Results | |||
---|---|---|---|
A Notebooks Competition scored by Community Judges | |||
Rank | Notebook Author | Final Score | (vs max possible score) |
1 | Leonie | 153.0 | |
2 | Michal Bogacz | 149.5 | |
3 | Bukun | 139.5 | |
4 | Schubert de Abreu | 132.0 | |
5 | Abir ELTAIEF | 102.0 | |
6 | Karthik | 81.0 | |
7 | Tanya Dayanand | 77.5 | |
8 | GSD | 64.0 | |
9 | sravya yellapragada | 55.0 |
Ok, we’re getting closer to our final goal. The last major element will be the profile pictures. I was really excited when I saw examples of gtExtras adding images to tables. On Kaggle, the picture that you set for your profile appears as your avatar pretty much everywhere you create or comment on something. Many prolific Kagglers are easily identified by their avatars. Thus, I wanted to add those avatars to the table not just as a different visual element, but also to strengthen the connection to the individual Kagglers.
You can go to any Kaggler’s profile to see the full-size version of their avatar picture. Here is my profile. The way that those images are stored on Kaggle’s side makes it a bit difficult to access them through a script (but we might look into that in a future post). For now, I’m downloading the pictures for our nine participants manually and store them in a folder called gems_comp_img
. These are the contents of that folder:
fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
## ../../static/pics/gems_comp_img/abireltaief.jpg
## ../../static/pics/gems_comp_img/ambarish.jpg
## ../../static/pics/gems_comp_img/gsdeepakkumar.jpg
## ../../static/pics/gems_comp_img/iamleonie.jpg
## ../../static/pics/gems_comp_img/karthikrg.png
## ../../static/pics/gems_comp_img/michau96.jpg
## ../../static/pics/gems_comp_img/spitfire2nd.jpg
## ../../static/pics/gems_comp_img/sravyaysk.png
## ../../static/pics/gems_comp_img/tanyadayanand.jpg
Now, I made things a little easier by naming the pictures after the Kaggle user names. This is the user
column in the input table. This will allow us to join this list of images to our scores table. First, we create a helper table called avatars
, where store the link to image and also parse out the user name using the tidyr
tool separate
:
avatars <- tibble(
avatar = fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
) %>%
separate(avatar, sep = "/", into = c("a", "b", "c", "d", "e", "user"), remove = FALSE) %>%
mutate(user = str_remove(user, ".jpg")) %>%
mutate(user = str_remove(user, ".png")) %>%
select(-a, -b, -c, -d, -e)
avatars %>%
gt() %>%
opt_row_striping()
avatar | user |
---|---|
../../static/pics/gems_comp_img/abireltaief.jpg | abireltaief |
../../static/pics/gems_comp_img/ambarish.jpg | ambarish |
../../static/pics/gems_comp_img/gsdeepakkumar.jpg | gsdeepakkumar |
../../static/pics/gems_comp_img/iamleonie.jpg | iamleonie |
../../static/pics/gems_comp_img/karthikrg.png | karthikrg |
../../static/pics/gems_comp_img/michau96.jpg | michau96 |
../../static/pics/gems_comp_img/spitfire2nd.jpg | spitfire2nd |
../../static/pics/gems_comp_img/sravyaysk.png | sravyaysk |
../../static/pics/gems_comp_img/tanyadayanand.jpg | tanyadayanand |
Now we can join those tables and add the images to our leaderboard! All the complexity of displaying an image in a table column is handled by gt_img_rows. Just marvellous! Only 2 additional lines to join and to format the column. A third, minor line removes the column header for those pictures
scores_total %>%
rownames_to_column(var = "rank") %>%
mutate(perc = total/(7*5*5) * 100) %>%
left_join(avatars, by = "user") %>%
select(rank, avatar, author, total, perc) %>%
gt() %>%
gt_img_rows(avatar, img_source = "local") %>%
gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>%
gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>%
gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>%
tab_header(
title = md("**Kaggle Hidden Gems Competition - Final Results**"),
subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
) %>%
cols_label(
rank = "Rank",
avatar = "",
author = "Notebook Author",
total = "Final Score",
perc = "(vs max possible score)"
) %>%
opt_row_striping()
Kaggle Hidden Gems Competition - Final Results | ||||
---|---|---|---|---|
A 2022 Notebooks Competition scored by Community Judges | ||||
Rank | Notebook Author | Final Score | (vs max possible score) | |
1 | Leonie | 153.0 | ||
2 | Michal Bogacz | 149.5 | ||
3 | Bukun | 139.5 | ||
4 | Schubert de Abreu | 132.0 | ||
5 | Abir ELTAIEF | 102.0 | ||
6 | Karthik | 81.0 | ||
7 | Tanya Dayanand | 77.5 | ||
8 | GSD | 64.0 | ||
9 | sravya yellapragada | 55.0 |
Brief note of caution: if you’re using fmt_markdown(columns = TRUE)
for styling your table columns, then this seems to clash with gt_img_rows
. Which probably makes sense, since an image link can be affected by markdown formatting. But it took me a while to figure out why my images wouldn’t work, so I’m adding this warning here to save you some time if you have similar issues. And maybe there’s a way for fmt_markdown
and gt_img_rows
to coexist; let me know if you find out how.
At this point, our leaderboard is practically finished. All the information and visual elements are there. But for the final touch, I will add a bit of text formatting to ensure that those author names and scores don’t blend too much into the background next to our avatars and score bars. This is the final polishing step that gets us to our table at the beginning.
I’m using tab_style
, a gt
command, to increase the size of the font and make it bold and, in case of the author names, also italic:
scores_total %>%
rownames_to_column(var = "rank") %>%
mutate(perc = total/(7*5*5) * 100) %>%
left_join(avatars, by = "user") %>%
select(rank, avatar, author, total, perc) %>%
gt() %>%
gt_img_rows(avatar, img_source = "local") %>%
gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>%
gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>%
gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>%
tab_header(
title = md("**Kaggle Hidden Gems Competition - Final Results**"),
subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
) %>%
tab_style(
style = cell_text(size = px(18), weight = "bold", style = "italic"),
locations = cells_body(
columns = c(author)
)
) %>%
tab_style(
style = cell_text(size = px(18), weight = "bold"),
locations = cells_body(
columns = c(total, rank)
)
) %>%
cols_label(
rank = "Rank",
avatar = "",
author = "Notebook Author",
total = "Final Score",
perc = "(vs max possible score)"
) %>%
opt_row_striping()
Kaggle Hidden Gems Competition - Final Results | ||||
---|---|---|---|---|
A 2022 Notebooks Competition scored by Community Judges | ||||
Rank | Notebook Author | Final Score | (vs max possible score) | |
1 | Leonie | 153.0 | ||
2 | Michal Bogacz | 149.5 | ||
3 | Bukun | 139.5 | ||
4 | Schubert de Abreu | 132.0 | ||
5 | Abir ELTAIEF | 102.0 | ||
6 | Karthik | 81.0 | ||
7 | Tanya Dayanand | 77.5 | ||
8 | GSD | 64.0 | ||
9 | sravya yellapragada | 55.0 |
And that’s it! We went from a plain table to a leaderboard design that I consider worthy of my first Kaggle (Notebooks) competition that I’ve organised. The combination of gt
and gtExtras
is impressively powerful, and I hope that it gives you motivation for your own table styling projects.
If you’re looking for more inspiration on visualisation or presentation ideas, then Kaggle is always a great place to start. You can check out the Hidden Gems competition notebooks, and specifically the winning contributions by Leonie, Michal Bogacz, and Bukun.
Have fun!