In the summer of 2004, I was watching the Athens Summer Olympics and as they usually do, the tv commentators regularly referenced the medal count. The US has a lot of medals! We’re winning the Olympics! And I immediately thought ‘well, of course we have a lot of medals – we sent like 500 athletes. If we don’t have a lot of medals, we’re actually doing pretty badly comparatively.’
So I started thinking of what would make a better mark of Olympic achievement and figured if I could normalize for the number of athletes, that would be a good start. That would start to get out how well each country converted their opportunity into medals. I found some athlete count data for each country online and started working up a spreadsheet. In order to rank, I also needed a medal count, but I also wasn’t satisfied with a straight medal count. Everyone knows that a bronze is just not the same as a gold, so they shouldn’t count equally. So, I worked up a weighted medal count per athlete and the results were pretty interesting!
That spreadsheet has disappeared over the years, but the top of my ranking was Eritrea, who sent something like 4 athletes and won a gold and a bronze – that’s pretty good efficiency! No one would really say that Eritrea ‘won’ the Olympics or that they were the most athletic country, but I was satisfied that I was able to highlight countries that actually had a pretty good Olympics given what they came with and probably wasn’t getting a lot of attention for their effort.
Fast forward a few years to the 2012 London Olympics. I decided I needed to revisit my Olympics Achievement calculations, but was already thinking of some improvements. Adjusting for the number of athletes is a good start, but if a country sends 30 athletes for a soccer team, they can win one medal – but send 30 track and field or swimming athletes and you could be competing for 100 medals. So, I found data for the sports and events each country was competing in and then counted how many possible medals that was. One athlete competing in four different events is four potential medals, but four athletes all competing in the 100m dash is three potential medals.
That year I also added a lot of economic and demographic data to the dataset. I tried normalizing for things like athletes per capita to help account for the fact that China has a whole lot of people to work with and their sending of 100 athletes vs say, Denmark, is pretty different. I kept trying to wrap my brain around how countries like the US or Canada or much of Europe as well as countries like Russia and China were so good at the Olympics, when there were very clearly different athletic cultures and approaches and found datasets like the Human Development Index and the Corruption Perception Index to see how much these would correlate with good Olympic achievement, but it turns out they’re not always good indicators.
This year I also started looking for ways to visualize my data and while I didn’t have access to any real BI tools, I used Google Fusion Tables and its mapping functionality as well as Google’s Developer Charting tools to start learning how to represent the data.
Come the Rio Olympics and I moved on to to using Tableau. We were evaluating BI tools for work, including Tableau, so I thought this was a great opportunity to learn. I refined my formula some more because I decided that while I wanted to normalize the medal count for medal opportunity, it was actually pretty impressive that countries like the US and China were able to develop, support and send 500+ athletes as well as participate in pretty much every sport in the Games. Some countries like say, Kazakhstan did pretty well with their opportunity, but were highly focused in one or two sports like Judo or Wrestling and didn’t have a ton of athletic diversity. So my formula now included a bonus for number of athletes and sports. This seemed to find a good balance between elevating some smaller delegations that performed really well as well as acknowledging that being able to develop a lot of athletes that win medals is a lot of what the Olympics is about. I also started calculating a differential between the standard medal tables and my ranking to see who was better off in which method.
It’s now 2018 and this is the first time I’ve worked on the Olympics data for a Winter Games. This year I’m working in Qlik Sense as that’s the platform we landed on at work, so I’m more familiar with it and this is a great opportunity to develop my skills. The biggest change this year is that I’ve added much more detail around the gender of athletes and events that will allow me to do some gender diversity and equality analysis. I’m looking forward to comparing the data between Summer and Winter games and hope to someday be able to do some year over year/achievement over time type of analysis.
Thanks for reading and make sure to leave ideas for additional data to include in analysis!