Here is a simple example illustrating the power rating calculation process. Say that a tournament were held between 6 rikishi: A-nohana, B-nonami, C-azuma, D-nonada, E-shuzan and F-taikai. The hoshitori table below shows the results:
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The first step in calculating our power ratings is to calculate each rikishi's raw score. The raw score of each rikishi is their won-loss percentage, minus 0.5. The reason we subtract 0.5 is that we want the sum of the raw scores to equal 0 (the choice of 0.5 is not arbitrary; it is the average won-loss percentage). The table below shows the scores for each rikishi:
| Rikishi | Won-Loss | Raw Score |
|---|---|---|
| A | 1.00 | 0.50 |
| B | 0.50 | 0.00 |
| C | 0.50 | 0.00 |
| D | 0.50 | 0.00 |
| E | 0.25 | -0.25 |
| F | 0.25 | -0.25 |
So that's it right? A-nohana is the best, with B-nonami, C-azuma and D-nonada deadlocked in the middle, with E-shuzan and F-taikai bringing up the rear. Well, if we look at the hoshitori, that hardly seems fair. C and D got to face both the also-rans (aka E and F), but B only got to face F. And E had to face the champion, but F didn't. How do we account for this?
The next step in calculating our power ratings is to incorporate the strength of schedule that each competitor faced. To calculate the strength of schedule faced for each rikishi, we take the average raw score of each opponent faced. From there, we add their raw score to the opposition score to get a new revised score:
| Rikishi | Raw Score | Opposition Score | Revised Score |
|---|---|---|---|
| A | 0.500 | -0.063 | 0.437 |
| B | 0.000 | 0.063 | 0.063 |
| C | 0.000 | 0.000 | 0.000 |
| D | 0.000 | 0.000 | 0.000 |
| E | -0.250 | 0.063 | -0.187 |
| F | -0.250 | -0.063 | -0.313 |
Now in our revised ratings, B is shown to be slighty superior to C and D, and E looks to be a fair bit better than F.
Is that it? Not exactly. Our ratings are based on the performance of an individual rikishi, and the rating of his opponents. Well, since we just calculated the ratings of the rikishi, why don't we figure out the strength of schedule again? And why don't we calculate the revised ratings again, using the following formula:
NewRating = OldRating + OppositionScore - LastOppositionScore
If you understand all that, you are home free; the above equation is the trickiest part of the whole process. Below are the new set of scores:
| Rikishi | Old Rating | Opposition Score | Old Opposition Score | New Rating |
|---|---|---|---|---|
| A | 0.437 | -0.031 | -0.063 | 0.469 |
| B | 0.063 | 0.031 | 0.063 | 0.031 |
| C | 0.000 | 0.000 | 0.000 | 0.000 |
| D | 0.000 | 0.000 | 0.000 | 0.000 |
| E | -0.187 | 0.031 | 0.063 | -0.219 |
| F | -0.313 | -0.031 | -0.063 | -0.281 |
The process can be repeated any number of times. In practice, you do it until the changes in ratings from one iteration to the next is not significant (calculating the power ratings for one year's worth of basho takes approximately 30 iterations). Obviously, this is where having a computer helps!
After a few more iterations, we get the final results:
| Rikishi | Power Rating |
|---|---|
| A | 0.457 |
| B | 0.043 |
| C | 0.000 |
| D | 0.000 |
| E | -0.215 |
| F | -0.333 |
Note that these numbers don't seem to correspond with the actual power rating values appearing here. That is because the values are scaled to a more "natural" range. The scaling process in detailed in the next section.
Sorry, under construction.