About the Sumo Power Ratings

How the Figures are Calculated

Here is a simple example illustrating the power rating calculation process. Say that a tournament were held between 6 rikishi: A-nohana, B-nonami, C-azuma, D-nonada, E-shuzan and F-taikai. The hoshitori table below shows the results:

Opponent
ABCDEF
Rikishi AWWWWDNF
BLLWDNFW
CLWDNFLW
DLLDNFWW
ELDNFWLL
FDNFLLLW
W Win
L Loss
DNF Did Not Fight

The first step in calculating our power ratings is to calculate each rikishi's raw score. The raw score of each rikishi is their won-loss percentage, minus 0.5. The reason we subtract 0.5 is that we want the sum of the raw scores to equal 0 (the choice of 0.5 is not arbitrary; it is the average won-loss percentage). The table below shows the scores for each rikishi:

Rikishi Won-Loss Raw Score
A 1.00 0.50
B 0.50 0.00
C 0.50 0.00
D 0.50 0.00
E 0.25 -0.25
F 0.25 -0.25

So that's it right? A-nohana is the best, with B-nonami, C-azuma and D-nonada deadlocked in the middle, with E-shuzan and F-taikai bringing up the rear. Well, if we look at the hoshitori, that hardly seems fair. C and D got to face both the also-rans (aka E and F), but B only got to face F. And E had to face the champion, but F didn't. How do we account for this?

The next step in calculating our power ratings is to incorporate the strength of schedule that each competitor faced. To calculate the strength of schedule faced for each rikishi, we take the average raw score of each opponent faced. From there, we add their raw score to the opposition score to get a new revised score:

Rikishi Raw Score Opposition Score Revised Score
A 0.500 -0.063 0.437
B 0.000 0.063 0.063
C 0.000 0.000 0.000
D 0.000 0.000 0.000
E -0.2500.063 -0.187
F -0.250-0.063 -0.313

Now in our revised ratings, B is shown to be slighty superior to C and D, and E looks to be a fair bit better than F.

Is that it? Not exactly. Our ratings are based on the performance of an individual rikishi, and the rating of his opponents. Well, since we just calculated the ratings of the rikishi, why don't we figure out the strength of schedule again? And why don't we calculate the revised ratings again, using the following formula:

NewRating = OldRating + OppositionScore - LastOppositionScore

If you understand all that, you are home free; the above equation is the trickiest part of the whole process. Below are the new set of scores:

Rikishi Old Rating Opposition Score Old Opposition Score New Rating
A 0.437 -0.031 -0.063 0.469
B 0.063 0.031 0.063 0.031
C 0.000 0.000 0.000 0.000
D 0.000 0.000 0.000 0.000
E -0.187 0.031 0.063 -0.219
F -0.313 -0.031 -0.063 -0.281

The process can be repeated any number of times. In practice, you do it until the changes in ratings from one iteration to the next is not significant (calculating the power ratings for one year's worth of basho takes approximately 30 iterations). Obviously, this is where having a computer helps!

After a few more iterations, we get the final results:

Rikishi Power Rating
A 0.457
B 0.043
C 0.000
D 0.000
E -0.215
F -0.333

Note that these numbers don't seem to correspond with the actual power rating values appearing here. That is because the values are scaled to a more "natural" range. The scaling process in detailed in the next section.

What the Figures Mean

Sorry, under construction.

Comments? Let me know