Finally, after weeks playing with Ironman results I have produced a ranking of the 2012 races that I’m willing to publish. By extending Monday’s comparison of Kona qualifiers’ performances to all athletes who raced multiple Ironman events I’ve written a small piece of software – I’ll save the full details for another post – that lists races ordered by their relative speed in swim, bike, run and overall splits. Not the perfect solution, but the best I’ve managed so far.
The table below shows the rankings of Ironman races based on 2012 age grouper results, along with the average time difference between each race and the fastest.
The rankings are produced by an algorithm that compares the times of athletes who raced multiple events and then combines and averages the differences in an attempt to place each race in order and determine how far apart they lie. So, unsurprisingly, New York is the fastest swim course of 2012, but perhaps more surprisingly – questionably even – Austria ranks the slowest, on average an age group athlete would swim 3.8K 26 minutes slower (the impact of averages including 2 hour swimmers) than they would at New York.
At this point, having scanned the table, I imagine there are some objections so let’s have the obligatory discussion of the limitations in this approach. One of the principle reasons for automating the process was to eliminate the potential for my own bias to affect the results, but relying on the data alone has flaws of its own. Firstly there are issues of data quality to be considered, many of which will become clear when I detail the production of these tables. The size of the dataset also has to be questioned, only a fraction of 2012 Ironman athletes race more than one event and the numbers racing specific combinations of events are smaller still. There is also an assumption that performances will be comparable for an athlete, or at least averaging will eliminate the impact of a bad day. Again I’ll go into more detail when I write about the development of these tables.
All that said I actually don’t think it’s done a bad job. Races seem generally well placed, for the most part faster races fall near the top and slower ones near the bottom. No ranking system will ever be perfect – at least with this one I can blame it on my code. The time differences are more suspect, I included them for interest, but they reflect the limitations of averaging averages. I would expect the difference between performances at races to be a function of fitness – a 10 hour Ironman will see a smaller difference than a 15 hour Ironman, these rankings fail to reflect that.
Finally, if I change my pool of athletes to just the pros and run the same software I get an entirely new ranking of races with much narrower time differentials between them. A reflection both of the difference in professional racing and the susceptibility of my code to outliers in performance.
As I said: an imperfect solution, but a step closer than I was. It needs some refinement before I’d want to use the time differentials to start bragging about how much faster I would have been at race X, but as a first attempt at normalising race data I’m happy.