CoachCox

Comparing 2012 Ironman Races

Finally, after weeks playing with Ironman results I have produced a ranking of the 2012 races that I’m willing to publish. By extending Monday’s comparison of Kona qualifiers’ performances to all athletes who raced multiple Ironman events I’ve written a small piece of software – I’ll save the full details for another post – that lists races ordered by their relative speed in swim, bike, run and overall splits. Not the perfect solution, but the best I’ve managed so far.

The table below shows the rankings of Ironman races based on 2012 age grouper results, along with the average time difference between each race and the fastest.

2012 Ironman Races Ranked by Differences Between Age Group Athletes' Performances Across Multiple Ironman Races

The rankings are produced by an algorithm that compares the times of athletes who raced multiple events and then combines and averages the differences in an attempt to place each race in order and determine how far apart they lie. So, unsurprisingly, New York is the fastest swim course of 2012, but perhaps more surprisingly – questionably even – Austria ranks the slowest, on average an age group athlete would swim 3.8K 26 minutes slower (the impact of averages including 2 hour swimmers) than they would at New York.

At this point, having scanned the table, I imagine there are some objections so let’s have the obligatory discussion of the limitations in this approach. One of the principle reasons for automating the process was to eliminate the potential for my own bias to affect the results, but relying on the data alone has flaws of its own. Firstly there are issues of data quality to be considered, many of which will become clear when I detail the production of these tables. The size of the dataset also has to be questioned, only a fraction of 2012 Ironman athletes race more than one event and the numbers racing specific combinations of events are smaller still. There is also an assumption that performances will be comparable for an athlete, or at least averaging will eliminate the impact of a bad day. Again I’ll go into more detail when I write about the development of these tables.

All that said I actually don’t think it’s done a bad job. Races seem generally well placed, for the most part faster races fall near the top and slower ones near the bottom. No ranking system will ever be perfect – at least with this one I can blame it on my code. The time differences are more suspect, I included them for interest, but they reflect the limitations of averaging averages. I would expect the difference between performances at races to be a function of fitness – a 10 hour Ironman will see a smaller difference than a 15 hour Ironman, these rankings fail to reflect that.

Finally, if I change my pool of athletes to just the pros and run the same software I get an entirely new ranking of races with much narrower time differentials between them. A reflection both of the difference in professional racing and the susceptibility of my code to outliers in performance.

2012 Ironman Races Ranked by Differences Between Professional Athletes' Performances Across Multiple Ironman Races

As I said: an imperfect solution, but a step closer than I was. It needs some refinement before I’d want to use the time differentials to start bragging about how much faster I would have been at race X, but as a first attempt at normalising race data I’m happy.

All Ironman Results and Statistics

A growing collection of results and statistics for the whole Ironman race calendar.

Find out what it takes to place in your age group or to qualify for the Ironman Worlds Championships in Kona.

Comments

  • John Levison

    Statto geek heaven! Interesting to see Sweden comes out as ‘fastest’, yet not (in this data), be #1 in either swim, bike or run. That sort of goes with my gut feel, that having been there, the conditions were about as perfect as you are ever going to get on a course which, on the right day, is going to be fast. Ties in to the Pro times too – if you look at those, none of the individual splits stand out as being ‘highly dubious’, but add them all together and you get a good day. I suspect in future years we’ll see a) some sub-8 men, and b) some significantly slower times too when the sea has some chop and the wind blows or it rains! The reverse is probably true of Austria in 2012 – a stinking hot day and non-wetsuit swim at an event history shows is typically ‘fast’. Florida being ‘slow’ is a bit of a surprise, given the speed at the front (4:04 bike split WR on a course which seem to be backed up by various GPS data) plus three Pro’s sub 8:10. Interestingly, Jan Raphael won Sweden (8:04) and was second in Florida (8:08), with very similar splits in both races! Great work Russ.

  • John,

    Seems that the age grouper table actually does a reasonably good job of ordering the races. Sweden along with St. George, Wales and the New York swim were the events I focussed on when judging the rankings as I expected all of them to fall into certain regions of the list. There are surprises of course, but that’s the nature of averaging fairly small data sets, the odd outlier pushes timings one way or another. I suspect that Florida, for example, doesn’t rank that highly on the Pro bike course because while there was Starykowicz WR bike split, there were a lot of pros there, many of whom fell off the pace on the bike.

    Interestingly comparing it with Thortsten’s rating table which includes more historical data, you can see some commonality, but also the differences produced from this year’s conditions.

    Russ

  • Rob Knell

    Really interesting stuff Russ, thanks for sharing this. If your results are susceptible to outliers is that a reflection of a use of mean values in the analysis? If so try using medians which are robust to outliers, would be interesting to compare the output to what you’ve got already.

    If IM Coz was 1 hour and a minute faster than IM Sweden then my 10.58 in Cozumel translates to 9:57 in Sweden. Excellent. Am virtual sub-10 hour IM finisher.

  • Rob,

    Not sure why I never thought to code in medians too – not hard, so here’s the resulting table using medians in place of means on the differences.

    2012 Ironman Races Ranked By Differences Between Age Group Athletes’ Performances Across Multiple Races (Median Version)

    Lots of races still hold similar positions in the table, but orders have changed as have time differences in the ranks. On a purely subjective level the mean based table feels better, but this is just based on my perception of one year of results, so I can’t say which approach might be the better model.

    Of course we could simply go with the one that gives the best virtual IM finish! Looks like either works for you.

    Russ