clock menu more-arrow no yes mobile

Filed under:

Stats: Scoring the Crunchy Power Rankings

The Crunchy Power Rankings are really only useful if they help us predict future performance. So do they? Let's ride the Wayback Machine to 2012 to find out.

A couple of weeks ago I posted the first edition of the 2013 Crunchy Power Rankings. I'll post another update after this weekend's games, but today I wanted to write about some validation I did of the methodology.

The basic idea of the CPR rankings (yes, this is redundant like MLS soccer and ATM machine) is to rank teams based on large-sample-size stats that correlate with long term success rather than just goals and wins, which fluctuate significantly over the short term. And to accomplish that I find stats with high correlations, weight them appropriately, and aggregate them into a single number. That's all well and good to get a final ranking, but it isn't as useful as I'd like if that ranking doesn't in turn have some predictive utility. The best sports statistics aren't simply backward-looking recaps of performance. They should provide some insight and help predict future performance as well.

To test that CPR is actually doing that, I went back to last season and took the standings on July 4 — a nicely patriotic cutoff point and roughly the halfway point of the season. Note I use points-per-game for the standings rather than raw points since teams had different numbers of games played. Then I also calculated the CPR rankings as of that date (I didn't actually publish a CPR that week but I can calculate them retroactively). Because I changed the formula this season by shifting some weights and adding the Recoveries stats, I actually calculated the retroactive CPR twice: once with the old weightings and once with the new. I'll call the latter CPR 2.0 for the purposes of this evaluation.

Here are the teams ranked by each of those measures. Note the big disparities with New England, LA, DC, Colorado, and Vancouver, which I wrote a lot about last year. Adding Recoveries in CPR 2.0 re-ranks New England way down to their standings ranking, but Vancouver and DC are well below their standings rank in both versions and both LA and Colorado are well above.

2012 First-Half Rankings
Team PPG CPR CPR 2.0
San Jose 1 4 2
Kansas City 2 3 1
DC United 3 14 13
New York 4 9 11
Real Salt Lake 5 5 8
Vancouver 6 17 15
Chicago 7 11 12
Seattle 8 6 3
Columbus 9 10 6
Houston 10 7 5
Chivas 11 16 18
New England 12 1 16
Portland 13 18 19
Colorado 14 2 4
Los Angeles 15 8 7
Montreal 16 12 10
Philadelphia 17 19 17
FC Dallas 18 15 9
Toronto 19 13 14

Then I calculated the teams' second half PPG standings to determine how well each of the three inputs — the actual standings, the old CPR, or CPR 2.0 — predicted the teams' second half performances. The question we're asking is — if you want to predict a team's second half performance — whether you're better off looking at the current standings or one of the CPR implementations.

Here are the teams in the same order with their second-half standings and the difference between that standings rank and each of the three candidate metrics.

Team Second-half
PPG standings
1st Half Diff CPR Diff CPR 2.0 Diff
San Jose 2 -1 2 0
Kansas City 4 -2 -1 -3
DC United 9 -6 5 4
New York 10 -6 -1 1
Real Salt Lake 7 -2 -2 1
Vancouver 18 -12 -1 -3
Chicago 5 2 6 7
Seattle 3 5 3 0
Columbus 11 -2 -1 -5
Houston 8 2 -1 -3
Chivas 19 -8 -3 -1
New England 14 -2 -13 2
Portland 16 -3 2 3
Colorado 15 -1 -13 -11
Los Angeles 1 14 7 6
Montreal 6 10 6 4
Philadelphia 13 4 6 4
FC Dallas 12 6 3 -3
Toronto 17 2 -4 -3
RMSE 26.3 24.1 18.4
Corr 0.39 0.49 0.70

For each metric I also calculated the Root Mean Squared Error — which is a standard measure of error in statistics — across all the teams. For that a lower value (less error) is desired. You can see that last year's CPR was somewhat better than just looking at the standings and the improved version is significantly better. I also calculated the Pearson correlation — which is an even better measure of a statistical model — between the predicted and actual rankings. For that you want a higher value, and the difference is pretty stark. What it's illustrating in effect is that if you wanted to predict the second half results of teams last year, the first half standings got you about 40% there, the old CPR got you about halfway there, and the new CPR gets you 70% of the way there.

I also highlighted some Significant Deviations, which I arbitrarily defined as being 8 or more standings places off. The first half standings had 4, with Vancouver and Chivas significantly underperforming their fist half results in the second half and Los Angeles and Montreal overperforming. Last year's CPR had only 2, predicting that New England and Colorado would be significantly better than they actually were. And the modified CPR — which fixes the New England deviation by adding Recoveries — has only one: Colorado.

Why exactly Colorado deviated so much from the CPR-predicted results is probably worth some digging and maybe a future article, but the Rapids aside it looks like CPR (particularly the updated one) is a big improvement over looking at the standings in predicting future performance. In particular, CPR "called" Vancouver and Chivas' late season stumbles and LA and Montreal's surges into the playoff races. Of course, it's only one season of data and you can't draw too many conclusions from that sample, but for now I'm confident that CPR is providing some insight that makes it worth maintaining.

Sign up for the newsletter Sign up for the Sounder At Heart Weekly Roundup newsletter!

A twice weekly roundup of Seattle Sounders and OL Reign news from Sounder at Heart