40 min read

Building a Kaggle Leaderboard with gtExtras

My Kaggle Hidden Gems Notebooks competition officially concluded on May 17th. A number of great notebooks were being submitted, and I wanted to design a nice leaderboard to announce the final scores and winners. When I discovered the new gtExtras package, I instantly knew that it would be the perfect tool for this task. Let me show you how.

The goal of the Hidden Gems competition was to build insightful and engaging notebooks, in R or Python, exploring aspect of my Hidden Gems dataset on Kaggle, which chronicles 100 episodes of me highlighting underrated notebooks published by Kagglers. The competition ran from April 5th to May 17th, and the contributed notebooks were evaluated by a stellar panel of Kaggle community judges on five key criteria that I have outlined in a previous post.

The gtExtras package was written by Tom Mock, and is a high-level wrapper around the fantastic gt package to create beautiful tables. I have used gt extensively since I’ve discovered it about a year ago. It is now my go-to package for presenting data in tables. Alongside gt and gtExtras we will need only a few basic tidyverse libraries:

libs <- c('dplyr', 'tibble',    # wrangling
          'tidyr', 'stringr',   # wrangling
          'readr',              # input
          'gt', 'gtExtras')     # tables
invisible(lapply(libs, library, character.only = TRUE))

The data for this leaderboard was aggregated from the scores of the individual judges. Those detailed scores will remain confidential, but in their aggregated form they constitute the public leaderboard. Here I’m reading in the aggregated table:

scores_total <- read_csv("../../static/files/gems_comp_scores_total.csv", col_types = cols())

A small set of helper variables defines some Kaggle-specific colours. If you’re active on the platform then those should look familiar to you. I like to use those colours for styling in Kaggle-related analyses; such as my own Hidden Gems EDA notebook:

kaggle_blue <- "#008abc"
kaggle_gold <- "#fed348"
kaggle_silver <- "#e9e9e9"
kaggle_bronze <- "#f0ba7c"

This is the setup for now. I will explain another preparation aspect along the way. For now, let’s start at the end. This is the full code to produce our final table:

avatars <- tibble(
  avatar = fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
) %>% 
  separate(avatar, sep = "/", into = c("a", "b", "c", "d", "e", "user"), remove = FALSE) %>% 
  mutate(user = str_remove(user, ".jpg")) %>% 
  mutate(user = str_remove(user, ".png")) %>% 
  select(-a, -b, -c, -d, -e)


scores_total %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  left_join(avatars, by = "user") %>% 
  select(rank, avatar, author, total, perc) %>% 
  gt() %>% 
  gt_img_rows(avatar, img_source = "local") %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  tab_style(
    style = cell_text(size = px(18), weight = "bold", style = "italic"),
    locations = cells_body(
      columns = c(author)
    )
  ) %>% 
  tab_style(
    style = cell_text(size = px(18), weight = "bold"),
    locations = cells_body(
      columns = c(total, rank)
    )
  ) %>% 
  cols_label(
    rank = "Rank",
    avatar = "",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()
Kaggle Hidden Gems Competition - Final Results
A 2022 Notebooks Competition scored by Community Judges
Rank Notebook Author Final Score (vs max possible score)
1 Leonie 153.0
2 Michal Bogacz 149.5
3 Bukun 139.5
4 Schubert de Abreu 132.0
5 Abir ELTAIEF 102.0
6 Karthik 81.0
7 Tanya Dayanand 77.5
8 GSD 64.0
9 sravya yellapragada 55.0

I hope you like this table as much as I do. gtExtras helped a lot in building this with relatively few lines of code. I was very impressed. The table itself lists the final rank of the author, alongside their name, profile picture, and score. The score is expressed in absolute points and as a bar chart relative to the maximum possible points. For the top 3 finishers, there is a gold - silver - bronze shading using Kaggle’s colours. Now let’s find out how we get there.

We start with the raw input table in standard gt formatting:

scores_total %>% 
  select(author, total) %>% 
  gt()
author total
Leonie 153.0
Michal Bogacz 149.5
Bukun 139.5
Schubert de Abreu 132.0
Abir ELTAIEF 102.0
Karthik 81.0
Tanya Dayanand 77.5
GSD 64.0
sravya yellapragada 55.0

Arguably, this is a clean but rather plain table. Scroll up to see the contrast to the final table. In our next step, we’ll add some basic gt styling in the shape of title & subtitle, column names, and row striping. We also add the final rank, which is already implicit in the table, as a new rank column:

scores_total %>% 
  select(author, total) %>% 
  rownames_to_column(var = "rank") %>% 
  gt() %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    author = "Notebook Author",
    total = "Final Score"
  ) %>%
  opt_row_striping()
Kaggle Hidden Gems Competition - Final Results
A Notebooks Competition scored by Community Judges
Rank Notebook Author Final Score
1 Leonie 153.0
2 Michal Bogacz 149.5
3 Bukun 139.5
4 Schubert de Abreu 132.0
5 Abir ELTAIEF 102.0
6 Karthik 81.0
7 Tanya Dayanand 77.5
8 GSD 64.0
9 sravya yellapragada 55.0

So far, so good. This is normally the extent to which I’m formatting tables that I’m using in basic EDA projects. But this is a special occasion, so let’s go the extra mile here. First: colour. The gold-silver-bronze shades are added through the gt_highlight_rows function. You can see that the parameters are pretty straightforward here: we specify the row(s), the fill colour, and also have control over the font:

scores_total %>% 
  select(author, total) %>% 
  rownames_to_column(var = "rank") %>% 
  gt() %>% 
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    author = "Notebook Author",
    total = "Final Score"
  ) %>%
  opt_row_striping()
Kaggle Hidden Gems Competition - Final Results
A Notebooks Competition scored by Community Judges
Rank Notebook Author Final Score
1 Leonie 153.0
2 Michal Bogacz 149.5
3 Bukun 139.5
4 Schubert de Abreu 132.0
5 Abir ELTAIEF 102.0
6 Karthik 81.0
7 Tanya Dayanand 77.5
8 GSD 64.0
9 sravya yellapragada 55.0

Next, we want to add the percentage of the score vs maximum possible score as a horizontal bar. This creates a nice visual contrast to the text and numbers in the table. The scoring system had a maximum of 5 points in 5 categories from 7 judges, for a total of 175 points. First, we compute the percentage with mutate, as usual. The we use gt_plt_bar_pct to the percentage as a bar visual:

scores_total %>% 
  select(author, total) %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  gt() %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()
Kaggle Hidden Gems Competition - Final Results
A Notebooks Competition scored by Community Judges
Rank Notebook Author Final Score (vs max possible score)
1 Leonie 153.0
2 Michal Bogacz 149.5
3 Bukun 139.5
4 Schubert de Abreu 132.0
5 Abir ELTAIEF 102.0
6 Karthik 81.0
7 Tanya Dayanand 77.5
8 GSD 64.0
9 sravya yellapragada 55.0

Ok, we’re getting closer to our final goal. The last major element will be the profile pictures. I was really excited when I saw examples of gtExtras adding images to tables. On Kaggle, the picture that you set for your profile appears as your avatar pretty much everywhere you create or comment on something. Many prolific Kagglers are easily identified by their avatars. Thus, I wanted to add those avatars to the table not just as a different visual element, but also to strengthen the connection to the individual Kagglers.

You can go to any Kaggler’s profile to see the full-size version of their avatar picture. Here is my profile. The way that those images are stored on Kaggle’s side makes it a bit difficult to access them through a script (but we might look into that in a future post). For now, I’m downloading the pictures for our nine participants manually and store them in a folder called gems_comp_img. These are the contents of that folder:

fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
## ../../static/pics/gems_comp_img/abireltaief.jpg
## ../../static/pics/gems_comp_img/ambarish.jpg
## ../../static/pics/gems_comp_img/gsdeepakkumar.jpg
## ../../static/pics/gems_comp_img/iamleonie.jpg
## ../../static/pics/gems_comp_img/karthikrg.png
## ../../static/pics/gems_comp_img/michau96.jpg
## ../../static/pics/gems_comp_img/spitfire2nd.jpg
## ../../static/pics/gems_comp_img/sravyaysk.png
## ../../static/pics/gems_comp_img/tanyadayanand.jpg

Now, I made things a little easier by naming the pictures after the Kaggle user names. This is the user column in the input table. This will allow us to join this list of images to our scores table. First, we create a helper table called avatars, where store the link to image and also parse out the user name using the tidyr tool separate:

avatars <- tibble(
  avatar = fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
) %>% 
  separate(avatar, sep = "/", into = c("a", "b", "c", "d", "e", "user"), remove = FALSE) %>% 
  mutate(user = str_remove(user, ".jpg")) %>% 
  mutate(user = str_remove(user, ".png")) %>% 
  select(-a, -b, -c, -d, -e)

avatars %>% 
  gt() %>%
  opt_row_striping()
avatar user
../../static/pics/gems_comp_img/abireltaief.jpg abireltaief
../../static/pics/gems_comp_img/ambarish.jpg ambarish
../../static/pics/gems_comp_img/gsdeepakkumar.jpg gsdeepakkumar
../../static/pics/gems_comp_img/iamleonie.jpg iamleonie
../../static/pics/gems_comp_img/karthikrg.png karthikrg
../../static/pics/gems_comp_img/michau96.jpg michau96
../../static/pics/gems_comp_img/spitfire2nd.jpg spitfire2nd
../../static/pics/gems_comp_img/sravyaysk.png sravyaysk
../../static/pics/gems_comp_img/tanyadayanand.jpg tanyadayanand

Now we can join those tables and add the images to our leaderboard! All the complexity of displaying an image in a table column is handled by gt_img_rows. Just marvellous! Only 2 additional lines to join and to format the column. A third, minor line removes the column header for those pictures

scores_total %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  left_join(avatars, by = "user") %>% 
  select(rank, avatar, author, total, perc) %>% 
  gt() %>% 
  gt_img_rows(avatar, img_source = "local") %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    avatar = "",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()
Kaggle Hidden Gems Competition - Final Results
A 2022 Notebooks Competition scored by Community Judges
Rank Notebook Author Final Score (vs max possible score)
1 Leonie 153.0
2 Michal Bogacz 149.5
3 Bukun 139.5
4