The Sumo Performance Rating System

I've been working on a heavily modified ELO system recently for the purposes of predicting how well a wrestler will perform in each basho, and I'm just about happy enough with it to share.

You've likely heard of ELO systems if you've even a passing interest in chess or online video games. It's recently started to be used more and more in sports, with FIFA using an ELO system for some of its rankings since 2018.

The idea is simple - you assign each competitor a number, which is their ELO rating. You then subtract or add to their ranking after each match using an equation, which is based on the outcome and on the difference between their ELO rating and the ELO rating of their opponent. The maths is simple, and can be found online in many places if you're interested.


I wanted to use an ELO system for sumo, but kept running into problems. The first problem was thrown into the light during the Hatsu basho of 2020. Tokushoryu won, and correspondingly his ELO rating skyrocketed. The vast majority of people could tell at a glance that Tokushoryu's win was likely a fluke, and that he wouldn't be a real contender in Haru 2020, when he would have to fight the Sanyaku wrestlers. Of course, this turned out to be true. The ELO system however, didn't know this and considered Tokushoryu to be the favourite to win the tournament in Haru.

The next issue is that certain wrestlers with very bad win rates would maintain average to slightly above average ELO ratings even as they slid down the banzuke. This would mainly occur in situations where a wrestler would post several losing records around the 7-8 or 6-9 mark in a row, but would grab their wins mainly against higher rated Rikishi. In some scenarios, these slipping fighters would actually gain ELO rating, and that didn't sit right with me.


If I were to create a new system, it would need to account for the two main problems outlined above.

Firstly, in order to stop fluke winners such as Tokushoryu from rising too quickly, it would need to take the rank they are fighting at into consideration.

Secondly, to stop poorly performing fighters from gaining rating due to lucky scheduling, it would have to take the wrestler's form into account.


Thus, the Performance Rating System was born. The new equation is a function of a wrestler's ELO rating, their current form (wins in the past year), and their current rank vs. their average rank over the past year.

The results have been good so far. Following Tokushoryu's win in January, his ELO rating was high, but his Perfomance Rating (PR) dropped to abysmally low levels since he was fighting at such a higher rank than usual.

A highlight for me, was the system predicting Daieisho would come in the top 3 in Hatsu 2021 - the basho he ended up winning.


Below is a graph showing the current Yokozuna and Ozeki, and how their PR has changed since the start of 2020.



Of course, there are problems with the system, and I think the above graph highlights a couple of them.

Hakuho's PR has fallen, because his recent inactivity has been interpreted by the system as a dramatic fall in form. This is generally a desired trait for the system, but for a Yokozuna it is probably flawed.

Terunofuji's PR should be considerably higher, but his rise from Juryo to Ozeki has been so sudden, the system believes him to fighting at a much higher rank than his 'true rank'. Of course, this isn't true - Terunofuji has been an Ozeki before. I think this flaw in the system shouldn't show up too often. Terunofuji's climb isn't an annual event. However, I still feel it's worth pointing out.


It's also worth mentioning that a wrestler's PR can't be compared to another wrestler's PR and used to determine who the winner will be in a head-to-head. The ELO system works much better for predicting the winner of a fight. PR has been developed to estimate who will perform the best in a basho.


I hope this has been of some interest to some of you. Once the Banzuke for Natsu 2021 is released, I would like to show you each wrestler's ELO and PR, and then we can dive a bit deeper into the stats.


James.


73 views0 comments