Building a Kaggle Leaderboard with gtExtras - head spin

My Kaggle Hidden Gems Notebooks competition officially concluded on May 17th. A number of great notebooks were being submitted, and I wanted to design a nice leaderboard to announce the final scores and winners. When I discovered the new gtExtras package, I instantly knew that it would be the perfect tool for this task. Let me show you how.

The goal of the Hidden Gems competition was to build insightful and engaging notebooks, in R or Python, exploring aspect of my Hidden Gems dataset on Kaggle, which chronicles 100 episodes of me highlighting underrated notebooks published by Kagglers. The competition ran from April 5th to May 17th, and the contributed notebooks were evaluated by a stellar panel of Kaggle community judges on five key criteria that I have outlined in a previous post.

The gtExtras package was written by Tom Mock, and is a high-level wrapper around the fantastic gt package to create beautiful tables. I have used gt extensively since I’ve discovered it about a year ago. It is now my go-to package for presenting data in tables. Alongside gt and gtExtras we will need only a few basic tidyverse libraries:

libs <- c('dplyr', 'tibble',    # wrangling
          'tidyr', 'stringr',   # wrangling
          'readr',              # input
          'gt', 'gtExtras')     # tables
invisible(lapply(libs, library, character.only = TRUE))

The data for this leaderboard was aggregated from the scores of the individual judges. Those detailed scores will remain confidential, but in their aggregated form they constitute the public leaderboard. Here I’m reading in the aggregated table:

scores_total <- read_csv("../../static/files/gems_comp_scores_total.csv", col_types = cols())

A small set of helper variables defines some Kaggle-specific colours. If you’re active on the platform then those should look familiar to you. I like to use those colours for styling in Kaggle-related analyses; such as my own Hidden Gems EDA notebook:

kaggle_blue <- "#008abc"
kaggle_gold <- "#fed348"
kaggle_silver <- "#e9e9e9"
kaggle_bronze <- "#f0ba7c"

This is the setup for now. I will explain another preparation aspect along the way. For now, let’s start at the end. This is the full code to produce our final table:

avatars <- tibble(
  avatar = fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
) %>% 
  separate(avatar, sep = "/", into = c("a", "b", "c", "d", "e", "user"), remove = FALSE) %>% 
  mutate(user = str_remove(user, ".jpg")) %>% 
  mutate(user = str_remove(user, ".png")) %>% 
  select(-a, -b, -c, -d, -e)


scores_total %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  left_join(avatars, by = "user") %>% 
  select(rank, avatar, author, total, perc) %>% 
  gt() %>% 
  gt_img_rows(avatar, img_source = "local") %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  tab_style(
    style = cell_text(size = px(18), weight = "bold", style = "italic"),
    locations = cells_body(
      columns = c(author)
    )
  ) %>% 
  tab_style(
    style = cell_text(size = px(18), weight = "bold"),
    locations = cells_body(
      columns = c(total, rank)
    )
  ) %>% 
  cols_label(
    rank = "Rank",
    avatar = "",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()

Kaggle Hidden Gems Competition - Final Results
A 2022 Notebooks Competition scored by Community Judges
Rank	Notebook Author	Final Score	(vs max possible score)
1	Leonie	153.0
2	Michal Bogacz	149.5
3	Bukun	139.5
4	Schubert de Abreu	132.0
5	Abir ELTAIEF	102.0
6	Karthik	81.0
7	Tanya Dayanand	77.5
8	GSD	64.0
9	sravya yellapragada	55.0

I hope you like this table as much as I do. gtExtras helped a lot in building this with relatively few lines of code. I was very impressed. The table itself lists the final rank of the author, alongside their name, profile picture, and score. The score is expressed in absolute points and as a bar chart relative to the maximum possible points. For the top 3 finishers, there is a gold - silver - bronze shading using Kaggle’s colours. Now let’s find out how we get there.

We start with the raw input table in standard gt formatting:

scores_total %>% 
  select(author, total) %>% 
  gt()

author	total
Leonie	153.0
Michal Bogacz	149.5
Bukun	139.5
Schubert de Abreu	132.0
Abir ELTAIEF	102.0
Karthik	81.0
Tanya Dayanand	77.5
GSD	64.0
sravya yellapragada	55.0

Arguably, this is a clean but rather plain table. Scroll up to see the contrast to the final table. In our next step, we’ll add some basic gt styling in the shape of title & subtitle, column names, and row striping. We also add the final rank, which is already implicit in the table, as a new rank column:

scores_total %>% 
  select(author, total) %>% 
  rownames_to_column(var = "rank") %>% 
  gt() %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    author = "Notebook Author",
    total = "Final Score"
  ) %>%
  opt_row_striping()

Kaggle Hidden Gems Competition - Final Results
A Notebooks Competition scored by Community Judges
Rank	Notebook Author	Final Score
1	Leonie	153.0
2	Michal Bogacz	149.5
3	Bukun	139.5
4	Schubert de Abreu	132.0
5	Abir ELTAIEF	102.0
6	Karthik	81.0
7	Tanya Dayanand	77.5
8	GSD	64.0
9	sravya yellapragada	55.0

So far, so good. This is normally the extent to which I’m formatting tables that I’m using in basic EDA projects. But this is a special occasion, so let’s go the extra mile here. First: colour. The gold-silver-bronze shades are added through the gt_highlight_rows function. You can see that the parameters are pretty straightforward here: we specify the row(s), the fill colour, and also have control over the font:

scores_total %>% 
  select(author, total) %>% 
  rownames_to_column(var = "rank") %>% 
  gt() %>% 
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    author = "Notebook Author",
    total = "Final Score"
  ) %>%
  opt_row_striping()

Kaggle Hidden Gems Competition - Final Results
A Notebooks Competition scored by Community Judges
Rank	Notebook Author	Final Score
1	Leonie	153.0
2	Michal Bogacz	149.5
3	Bukun	139.5
4	Schubert de Abreu	132.0
5	Abir ELTAIEF	102.0
6	Karthik	81.0
7	Tanya Dayanand	77.5
8	GSD	64.0
9	sravya yellapragada	55.0

Next, we want to add the percentage of the score vs maximum possible score as a horizontal bar. This creates a nice visual contrast to the text and numbers in the table. The scoring system had a maximum of 5 points in 5 categories from 7 judges, for a total of 175 points. First, we compute the percentage with mutate, as usual. The we use gt_plt_bar_pct to the percentage as a bar visual:

scores_total %>% 
  select(author, total) %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  gt() %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()

Kaggle Hidden Gems Competition - Final Results
A Notebooks Competition scored by Community Judges
Rank	Notebook Author	Final Score	(vs max possible score)
1	Leonie	153.0
2	Michal Bogacz	149.5
3	Bukun	139.5
4	Schubert de Abreu	132.0
5	Abir ELTAIEF	102.0
6	Karthik	81.0
7	Tanya Dayanand	77.5
8	GSD	64.0
9	sravya yellapragada	55.0

Ok, we’re getting closer to our final goal. The last major element will be the profile pictures. I was really excited when I saw examples of gtExtras adding images to tables. On Kaggle, the picture that you set for your profile appears as your avatar pretty much everywhere you create or comment on something. Many prolific Kagglers are easily identified by their avatars. Thus, I wanted to add those avatars to the table not just as a different visual element, but also to strengthen the connection to the individual Kagglers.

You can go to any Kaggler’s profile to see the full-size version of their avatar picture. Here is my profile. The way that those images are stored on Kaggle’s side makes it a bit difficult to access them through a script (but we might look into that in a future post). For now, I’m downloading the pictures for our nine participants manually and store them in a folder called gems_comp_img. These are the contents of that folder:

fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")

## ../../static/pics/gems_comp_img/abireltaief.jpg
## ../../static/pics/gems_comp_img/ambarish.jpg
## ../../static/pics/gems_comp_img/gsdeepakkumar.jpg
## ../../static/pics/gems_comp_img/iamleonie.jpg
## ../../static/pics/gems_comp_img/karthikrg.png
## ../../static/pics/gems_comp_img/michau96.jpg
## ../../static/pics/gems_comp_img/spitfire2nd.jpg
## ../../static/pics/gems_comp_img/sravyaysk.png
## ../../static/pics/gems_comp_img/tanyadayanand.jpg

Now, I made things a little easier by naming the pictures after the Kaggle user names. This is the user column in the input table. This will allow us to join this list of images to our scores table. First, we create a helper table called avatars, where store the link to image and also parse out the user name using the tidyr tool separate:

avatars <- tibble(
  avatar = fs::dir_ls("../../static/pics/gems_comp_img/", glob = "*.*")
) %>% 
  separate(avatar, sep = "/", into = c("a", "b", "c", "d", "e", "user"), remove = FALSE) %>% 
  mutate(user = str_remove(user, ".jpg")) %>% 
  mutate(user = str_remove(user, ".png")) %>% 
  select(-a, -b, -c, -d, -e)

avatars %>% 
  gt() %>%
  opt_row_striping()

avatar	user
../../static/pics/gems_comp_img/abireltaief.jpg	abireltaief
../../static/pics/gems_comp_img/ambarish.jpg	ambarish
../../static/pics/gems_comp_img/gsdeepakkumar.jpg	gsdeepakkumar
../../static/pics/gems_comp_img/iamleonie.jpg	iamleonie
../../static/pics/gems_comp_img/karthikrg.png	karthikrg
../../static/pics/gems_comp_img/michau96.jpg	michau96
../../static/pics/gems_comp_img/spitfire2nd.jpg	spitfire2nd
../../static/pics/gems_comp_img/sravyaysk.png	sravyaysk
../../static/pics/gems_comp_img/tanyadayanand.jpg	tanyadayanand

Now we can join those tables and add the images to our leaderboard! All the complexity of displaying an image in a table column is handled by gt_img_rows. Just marvellous! Only 2 additional lines to join and to format the column. A third, minor line removes the column header for those pictures

scores_total %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  left_join(avatars, by = "user") %>% 
  select(rank, avatar, author, total, perc) %>% 
  gt() %>% 
  gt_img_rows(avatar, img_source = "local") %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  cols_label(
    rank = "Rank",
    avatar = "",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()

Kaggle Hidden Gems Competition - Final Results
A 2022 Notebooks Competition scored by Community Judges
Rank	Notebook Author	Final Score	(vs max possible score)
1	Leonie	153.0
2	Michal Bogacz	149.5
3	Bukun	139.5
4	Schubert de Abreu	132.0
5	Abir ELTAIEF	102.0
6	Karthik	81.0
7	Tanya Dayanand	77.5
8	GSD	64.0
9	sravya yellapragada	55.0

Brief note of caution: if you’re using fmt_markdown(columns = TRUE) for styling your table columns, then this seems to clash with gt_img_rows. Which probably makes sense, since an image link can be affected by markdown formatting. But it took me a while to figure out why my images wouldn’t work, so I’m adding this warning here to save you some time if you have similar issues. And maybe there’s a way for fmt_markdown and gt_img_rows to coexist; let me know if you find out how.

At this point, our leaderboard is practically finished. All the information and visual elements are there. But for the final touch, I will add a bit of text formatting to ensure that those author names and scores don’t blend too much into the background next to our avatars and score bars. This is the final polishing step that gets us to our table at the beginning.

I’m using tab_style, a gt command, to increase the size of the font and make it bold and, in case of the author names, also italic:

scores_total %>% 
  rownames_to_column(var = "rank") %>% 
  mutate(perc = total/(7*5*5) * 100) %>% 
  left_join(avatars, by = "user") %>% 
  select(rank, avatar, author, total, perc) %>% 
  gt() %>% 
  gt_img_rows(avatar, img_source = "local") %>% 
  gt_plt_bar_pct(column = perc, scaled = TRUE, fill = kaggle_blue) %>%
  gt_highlight_rows(rows = 1, font_weight = "normal", fill = kaggle_gold) %>% 
  gt_highlight_rows(rows = 2, font_weight = "normal", fill = kaggle_silver) %>% 
  gt_highlight_rows(rows = 3, font_weight = "normal", fill = kaggle_bronze) %>% 
  tab_header(
    title = md("**Kaggle Hidden Gems Competition - Final Results**"),
    subtitle = html(str_c("<em>A 2022 Notebooks Competition scored by Community Judges </em>"))
    ) %>% 
  tab_style(
    style = cell_text(size = px(18), weight = "bold", style = "italic"),
    locations = cells_body(
      columns = c(author)
    )
  ) %>% 
  tab_style(
    style = cell_text(size = px(18), weight = "bold"),
    locations = cells_body(
      columns = c(total, rank)
    )
  ) %>% 
  cols_label(
    rank = "Rank",
    avatar = "",
    author = "Notebook Author",
    total = "Final Score",
    perc = "(vs max possible score)"
  ) %>%
  opt_row_striping()

Kaggle Hidden Gems Competition - Final Results
A 2022 Notebooks Competition scored by Community Judges
Rank	Notebook Author	Final Score	(vs max possible score)
1	Leonie	153.0
2	Michal Bogacz	149.5
3	Bukun	139.5
4	Schubert de Abreu	132.0
5	Abir ELTAIEF	102.0
6	Karthik	81.0
7	Tanya Dayanand	77.5
8	GSD	64.0
9	sravya yellapragada	55.0

And that’s it! We went from a plain table to a leaderboard design that I consider worthy of my first Kaggle (Notebooks) competition that I’ve organised. The combination of gt and gtExtras is impressively powerful, and I hope that it gives you motivation for your own table styling projects.

If you’re looking for more inspiration on visualisation or presentation ideas, then Kaggle is always a great place to start. You can check out the Hidden Gems competition notebooks, and specifically the winning contributions by Leonie, Michal Bogacz, and Bukun.

Have fun!