This is the second part in a series of posts about setting and tracking my yearly goals when it comes to Machine Learning & Data Science (ML & DS) as well as Sports & Exercise. In the first post I wrote about my past goals for 2021 and the lessons learned from those. This post will focus on my goals and expectations for 2022, and how last year’s experiences shaped and informed them. This might be more fun to read if you’ve seen the comedy show documentary Silicon Valley.
Stretch to avoid injuries
Here’s the summary tweet with my stretch goals from the 1st of January:
My 2022 stretch goals:
— Martin Henze (Heads or Tails) (@heads0rtai1s) January 1, 2022
- Spend 400 h learning ML/DS
- Join 2 NLP + 2 Image competitions on @kaggle
- Team up in 3 competitions
- Win 2 Kaggle comp medals
- Write 50 blog posts
- Run a sub-4h marathon
- Do 1 muscle up
- Sleep at least 8 h/night with a std dev <= 0.5 h
The idea of those being stretch goals is that I’m not expecting to reach all of them. And that would be fine. Because the important aspect is the journey towards those goals, and the skills and experiences that I hope this journey will give me.
The goals themselves are still important, and I will do my best in reaching them. Some will go better than others. Trying to become better at something can be a vague target, and concrete goals can be very useful in providing a metric to measure progress and success. Similar to ML, finding a good metric can make a big difference. And of course also similar to ML any metric can be gamed; but you would only be fooling yourself. Thus, when striving for those goals it is essential to remember the purpose behind them, and the vision you had when setting them.
Spend 400 h learning ML/DS
As I wrote in the previous post, hours themselves don’t create mastery. Spending your time effectively is more important that spending lots of it with vague intentions. My aspiration here is to grow my ML/DS skills in specific directions. I realised that I’m still lacking the kind of solid understanding of many ML concepts that would allow me to use them as confidently and creatively as I would like to.
Let me try to illustrate what I mean. When I encounter a certain data problem - be it on Kaggle, at work, or anywhere - then in my head I construct ideas for ways to process the data to make it easier to solve the problem. For instance, for the Kaggle Titanic challenge I would try to come up with feature engineering steps to figure out whether someone was travelling alone or in groups, and what kind of groups those could be (family? friends?). That’s the creative part. Translating these ideas into code is where the skill comes in. For traditional (tabular) ML I’m reasonably good at this translation, but when it comes to Deep Learning (DL) my process is still slow and inefficient.
Besides the tabular ML and feature engineering flow, I also know what to aim for in my skill development through the analogy to data visualisation. I’m a big fan of dataviz, and it arguably contributed a lot to my early successes in the Kaggle community. When it comes to a visualisation problem, I’m able to envision the data in my head and then find the right (ggplot2) tools to realise my imagination through code. Ideas flow almost unimpeded through the power of dataviz skills and tools. From there, my limitations (which are still plenty) mostly arise from a lack in creativity rather than from my coding skills. And this is where I want to be with ML/DL. Maybe not this year; but I want to get much closer.
Join 2 NLP + 2 Image competitions on Kaggle
One way to make my ML goals even more concrete is through Kaggle competitions. Challenges throughout the year are manifold and diverse. They typically attract a few thousand people, hundreds among whom are sharing their ideas or code. I can’t think of a better way of learning than to immerse yourself into such a competition for (typically) 3 months and soak up the skills and problem solving ingenuity that others display.
Specifically, I want to improve my skills in Image problems (aka Computer Vision) and Natural Language Processing (NLP). Both of those are solidly in the domain of DL these days. Image-based challenges are usually plentiful throughout the year. NLP competitions can be less frequent, but it should hopefully still be possible to encounter 2 interesting ones in 2022.
When it comes to Image data, I’m reasonably happy with my progress in 2021. I started learning about FastAI and using it more confidently in building my own competition pipeline and tinkering with custom dataloaders in some competitions. I also started to look into the underlying basic functionality in Pytorch and torch for R (FastAI is a high-level wrapper for Pytorch). While the way that torch works is pretty intuitive, I need to spend more time with the code itself to become more confident in using it creatively.
For NLP, the huggingface libraries have fast become the gold standard, and the ecosystem is growing and evolving rapidly. I enjoyed taking the first huggingface course last year, and I plan to continue with those courses to learn more. There are a few libraries that bring the huggingface tools into the FastAI ecosystem; which is an ideal combination for me at this stage. I plan to focus on those, besides the general huggingface framework, and will hopefully blog about it every now and then.
Team up in 3 competitions
I thoroughly enjoyed my teaming up experiences in 2021 and plan to continue this approach in 2022. There is a lot that you can learn from your team mate(s) on Kaggle, since many people come from such different professional and technical backgrounds. The goal of 3 competitions seems relatively low compared to the rest of the goals, but I’m aiming for gradual progress here. Last year it was 2 competitions.
Finding the right balance in a team can be a challenge, as is the way that progress is planned and implemented. Communication is really crucial here; as in so many areas of life. This applies not necessarily to the goal of doing well in a Kaggle competition, but to getting the most out of the learning experiences that are enabled by the process of collaboration. In all of those aspects I consider myself fortunate in having teamed up with Yassine, and I’m looking forward to our future collaborations.
Win 2 Kaggle comp medals
In 2021, I purposefully didn’t set a goal of a certain performance in a Kaggle competition. I just wanted to grow my (DL) skills by spending more time on Kaggle problems. Now, I feel like I’m at a stage where I can expect to develop those skills to the extend where my progress translates to better finishes on competition leaderboards.
I’m a fan of the Kaggle medal system, and of the gamification aspect in general. I consider it an incarnation of the idea of concrete metrics replacing otherwise vague goals that I wrote about at the beginning of this post. But here there can be an even greater incentive to game the system and take the medals as a goal in themselves, rather than a reflection of your skills. Sometimes, it might be possible to win a (bronze) medal by forking a public notebook or blindly ensembling other people’s solutions without understanding them (and getting lucky in the shake-up). That’s not what I want. Those medals don’t really count for anything, since they are disconnected from your actual abilities.
I want to write my own code and understand (ideally) every line of it. Inspired by other people’s ideas, yes; but incorporating those inspirations purposefully and in an informed way into my own pipelines. Plan my experiments intelligently and choose the best progress based on the results of those experiments. Those medals will mean something. Most importantly, they will mean something to me. Setting this goal and putting it out there will create a bit more pressure to succeed.
Write 50 blog posts
Blogging was an aspect of my journey that suffered in 2021. Of the 12 posts that I had planned, I wrote only 2. But instead of lowering the bar more, I decided to raise it to 50 instead. This might not make a lot of intuitive sense, since a target of 50 looks way more intimidating than a mere dozen. However, I realised that it wasn’t really the number of posts that stopped me from writing. It was the process of writing a single posts, which I saw as a significant endeavour that needed preparation and research and comprehensive content to be worth it.
I can be a bit of a perfectionist, which sometimes stops me from making public something that I had written but wasn’t happy with. And I would hate to turn into the kind of spammer who just churns out cookie-cutter posts in the search for engagement and followers. You know the ones. Although I don’t expect a large audience for those posts, I want to write something of value for those who stumble across my blog. There’s enough noise on the internet already.
But here’s the crux: writing valuable content doesn’t have to mean writing long and/or polished content. Short snippets of concise code or brief reflections on recent learnings can have as much value as posts that have been months in the making. I simply need to get out of my comfort zone and write more.
One benefit of writing is analogous to the idea that teaching a concept to someone is a great way of learning about this concept. If you can’t explain it, then do you really understand it? And if you can’t write about it clearly and concisely, then chances are you don’t actually understand it either. But understanding something is not necessarily a binary thing, and sharing even small progress in learning can be useful to people in similar situations.
And after writing only 2 posts in 2021 (albeit more technical posts), this will be already the second post of 2022.
Run a sub-4h marathon
My first ever marathon in 2021 (not a race, just the distance) clocked in at 4 hours 25 minutes. Not too bad; now let’s see whether we can get under 4 hours. Shaving off 25 minutes of 42km might not look much at the outset, but it’s about 35s per km, which here is the difference between a 5:45 pace (4h-goal) and a 6:20 pace (last year), which is pretty noticeable. (Being European, I measure my pace in min/km.)
My pace for shorter runs is significantly faster than that; and I’ve run a personal best 1h 40min half-marathon (i.e. 4:45 pace). However, a marathon becomes very different after the 2h mark, and most certainly during the final 10km. I will probably still need to take short stretches of walking with water and energy gels, which need to be factored into the overall running pace. I feel like being at 5:00 pace for the first half and then aiming for 5:30 - 6:30 pace for the remainder sounds realistic. In contrast, last year I dropped to 7:30 pace for the last 12km, albeit having done quite well until km 25. My plan is to work more on the stages between km 20 and 30 with long training runs of 2h to 3h durations during most weekends. Should be fun!
Do 1 muscle up
This is a repetition of the 2021 goal, since I didn’t accomplish anything in that direction. For 2022, I will have to build a better plan to make at least some progress. I’m pretty sure that a muscle up is mostly a question of technique, given a certain level of strength. None of that silly cross-fit-style swingy stuff though; I’m talking about proper technique. There are different stages of the muscle up movement, and I hope to be able to train some of those in isolation.
Instead of trying to build out the strength first (and my pull up continuity in 2021 wasn’t too bad), this year I will invest more time to research the technique and put together a plan to get me progressively closer to my goal.
Sleep at least 8 h/night with a std dev <= 0.5 h
This last goal might sound weirdly specific, but it’s all about continuity. As a student, then PhD student, then astronomer (of all things!) it’s easy to slip into weird sleeping habits. Even if you don’t actually have to spend your nights at a telescope. Working at a startup might sound like one of those “out of the frying pan and into the fire” type situations when it comes to being sleep deprived, but I’m happy to tell you that there are startups who recognise the value of a healthy work-life balance. Since more and more research is highlighting the importance of a good night’s sleep, I identified this goal as foundational for all the other goals in this post.
It’s honestly also something that you should listen to your body for. From time to time it might be necessary to work late to meet a deadline, but the thought that your brain won’t be impacted by only 4h of shut-eye is pretty untenable. You pay for burning the midnight oil with reduced cognitive spark over the following days, so your average productivity still goes down even though you worked all those extra hours. And you might be able to get away with it for a while in your 20s, since that’s pretty much the kind of bad decisions that your 20s are for, but it’s still gonna catch up with you eventually. Anyway, no lecturing intended here. Sleep is super important, and I suggest to use it smartly to get where you want to get.
So what about that goal? Well, last year I managed to exceed my planned average of 8h / night. But you can famously drown in a river that’s only 30cm deep on average. Variance matters, and there was too much of that in my sleep patterns. Once more, I was looking for a concrete goal to improve my habits. A standard deviation of less than 30 mins means that in 67% of nights I would be less than +/- 15 min away from my average goal. That sounds broadly doable. Last year’s standard deviation was 1h.
I’m not suggesting to run around with a stop watch either, or try to plan your day down to the minute. That’s a recipe for a neurotic type of disaster worthy of a silly and slightly problematic 80s comedy. My approach is to pay a bit more attention to guide my habits towards being more sustainable and then let routine take over. You won’t always able to stick to those habits every single day, but if you’re generally maintaining them that’s already a big improvement. So don’t sleep on sleep.
Now you know some of my plans for 2022. I’d be happy to read about yours, in whichever shape or form you’d like to share them.
You might have noticed a pattern in most of my goals, and that is something I will write about more in detail in a future post: my overarching aim is for my 2022 self to be better than my 2021 self. Comparing yourself to other people can be a futile and demoralising exercise, especially if you, like me, tend to look to the most high-achieving people in your field or community for inspiration. Even more so in this age of social media highlight reels. In contrast, I believe that comparing yourself to a previous baseline of yourself can be a great way to learn from past mistakes and experiences. Look at what worked, slightly tweak those things that didn’t work to see if that improves the situation.
It’s the basis for ML experiments, which comes from the basis of any science experiments, which comes from the scientific method, which comes from an evidence-based approach to trying to understand this often perplexing world in which we find ourselves. Reflecting on your choices and their consequences is a valuable tool in any context. Or as Socrates used to say: “The unexamined life is not worth living”.
In the third and final part of this goals series I will write about measuring my goals and (finally!) bring in some code and visualisations. Stay tuned!
This is the end of this post.