The relationship RPM and other metrics have with predictive wins
Every year on the ABPR forums, there is a major competition for NBA team win prediction. The contest is simple; a few guys aim to guess the total wins for each NBA team. Last year’s competition had 23 entrants counting some involuntary ones like FiveThirtyEight and Vegas. This competition is a large part of driving forward in determining which statistics the stats-community at large regularly use. It is the origin of why RPM and PT-PM are the predominant publicly available statistics used.
But in order to understand why this is such a big deal in determining the value of statistics, you have to look back to the year 2013. Neil Paine, now of FiveThirtyEight, more or less shortcutted the whole process. He went back to prior years to see which stats best predicted the next season, if you already knew the minute distribution. Paine’s findings are arguably the most cited post on APBR ever. This was an extremely important study for a number of reasons. First, it gave us a standard for out-of-sample validation — testing the quality of the model on a sample other than the one that was used to create the model originally — of the aforementioned statistics and similar ones. Second, it corrected a problem prior year (and future year) competitions had in isolation.
Most of the time, the winner was the person who got the best picture of how minutes would be split. Some luck played a role as well. This process involved eliminating the non-metric portion by giving the test advance knowledge of the number of minutes players would actually play. Furthermore, minimizing the amount of randomness from year-to-year outliers by using a larger sample strengthened the test heavily.
The independent variable here is the Next Year’s Minutes Weighted Average for each team year for each statistic. So the 2015 Chicago Bulls and the 2014 Dallas Mavericks, for example, have an entry for BPM, Win Shares, PER, RAPM, and RPM. PT-PM, as much as I would’ve liked to test it, was actually omitted from this study because the formula has changed over the last few years. Moreover, the data isn’t readily available for any season other than the last two. This slices the sample size too small for me to be comfortable relying on any data that generates.
Players with less than 250 minutes in the prior season and those without a recorded value for the statistics are assigned values equal to the average for each statistic. These averages are 0.00 for RPM, RAPM, and BPM, .100 in WS/48, and 15 in PER.These stipulations are designed to mirror the original test and allow comparison on relatively equal grounds between statistics. One thing to note is that RPM omits players waived and not re-signed from its rankings. This occasionally matters when, say, Mario Chalmers tears his Achilles after playing 1373 minutes for the Heat and Grizzlies combined. Most of those cases are covered by the 250-minute threshold, but some are not. And of course, there are rookies, for whom this method likely overrates them.
The dependent variable here is the next-year Win Percentage of the teams. As a sanity check, I also looked at using the next-year Pythagorean Win Percentage as well to avoid a statistic accidentally abusing randomness.
The sample of the test in question is the last three year groupings — i.e. ’13-’14 predicting ’14-15, ’14-’15 predicting ’15-’16, and ’15-’16 predicting ’16-’17 — which constitutes all possible pairings for which RPM is available. Since there was some scaling of the fit with additional years’ worth of data, despite having additional numbers for the other non-RPM statistics, for the sake of fairness to RPM those stats were omitted. The effect of the independent variable on the dependent variable was determined by OLS regression, and the fit of the model was determined by R2. This is because we’re aiming for the percent of the variation in the data described rather than pure statistical significance, and don’t have varying numbers of independent variables such that we’d have to use adjusted R2.
[table id=6 /]
Based on this data, we can conclude the following things:
- RPM is clearly king among these metrics, beating even Vanilla RAPM. This wasn’t assured given some changes to the publicly available RPM, designing it to make the statistic descriptive rather than predictive.
- Over a sample as large as the original study, it doesn’t seem implausible that RPM would hit 80% accuracy in predicting next year’s wins, given that RAPM was at 75% previously with a 12-year sample. It should, based on this, always be used over the other four statistics in this post as a result assuming that RPM is actually available for that year.
- BPM is, similarly, the king of the metrics that don’t require any lineup data. This was something widely assumed, given that BPM replaced ASPM from the previous study. Validating that information is very important as well.
- For player evaluation, as these stats are so frequently used, there is no reason ever to use Win Shares or PER. RPM and BPM are simply better. In fact, BPM is good enough that it beats “pure” lineup data in Vanilla RAPM, though that’s by a statistically insignificant margin.
Another interesting finding is just how divergent RAPM is in predicting actual wins and Pythagorean expectation, going from right around as valuable as BPM to barely ahead of Win Shares. That indicates a bit of randomness associated with the relatively restricted sample size that I had available to me. Given the previous studies, I’m content to assume that the number right around BPM is the one that should be more directly valued, since an estimator should not outperform the actual number.
Ultimately, this test more or less validates what most people were already doing with their stats work. Using RPM and BPM in their respective contexts was generally perceived as the correct path already. We still don’t have validation on tracking data based statistics, but studies determining their statistical significance have been performed. In the meantime, this simply provides a mathematical backing for the current practices. Ultimately, the math does hold to validate RPM and BPM as the best in their appropriate contexts.