How significant is NBA tracking data now becomes the question at hand
Over the past few years, NBA.com’s tracking data has been one of the most useful tools for NBA fans, writers, podcasters, etc. It allows for additional context beyond the standard counting statistics. We now know the percentage of contested rebounds, the number of pull up shot attempts, the number of assists generated off of drives to the basket, and so forth.
These additional details brings forth truly impressive statistics and facts, such as the 2016-17 Golden State Warriors having three players in the Top 10 of catch and shoot efficiency at high volume: Stephen Curry is 1st with a 67.4 eFG% on 5.1 FGA, Kevin Durant is 5th with a 63.7 eFG% on 4.7 FGA, and Klay Thompson is 9th with a 62.1 eFG% on a whopping 9.3 FGA — Thompson led the NBA in catch and shoot FGA and points per game. No wonder why the Warrior’s offense was so potent and unstoppable this season.
Earlier in the year, I performed a study on different shot types using tracking data in the context of the New York Knicks. With the season over and completed, I wanted to improve upon that study and take a deeper dive into NBA tracking data to determine how well it explains the variation of a team’s win total. If you are not familiar with my previous analytical work, my methodology, or robust Random-effects GLS regression and you would prefer to be familiarized before diving into the meat of the article, do not hesitate to check out two of previous articles about shooting zones and general NBA data.
The goal of this study is to determine how much variation of a team’s win total – the dependent variable of the study — do key tracking data variables explain as well as what specific variables are statistically significant when holding the other variables constant. The data was gathered from NBA.com and consists of team tracking data from the 2013–14 season to the 2016–17 season (four NBA seasons in total). Below is the list of the independent variables of the analysis:
- Drive True Shooting Attempts
- Drive True Shooting Percentage
- Drive Assist Percentage
- Elbow True Shooting Attempts
- Elbow True Shooting Percentage
- Elbow Assist Percentage
- Paint True Shooting Attempts
- Paint True Shooting Percentage
- Paint Assist Percentage
- Catch & Shoot Field Goal Attempts
- Catch & Shoot Effective Field Goal Percentage
- Pull Up Field Goal Attempts
- Pull Up Effective Field Goal Percentage
- Contested Offensive Rebounding Percentage
- Contested Defensive Rebounding Percentage
- Average Offensive Speed
- Average Defensive Speed
- Defended Field Goal Attempts
- Defended Field Goal Percentage
- Each NBA season
The logic behind the selection of these variables is to capture the key and fundamental aspects of basketball, especially the current era of the NBA: shooting volume and efficiency, shot selection, passing, rebounding, defense, speed/pace. There are a few things to note:
- NBA tracking data does not have true shooting statistics and I calculated them myself, using the formulas provided on Basketball-Reference to control for free throw shooting.
- Instead of using the general tracking passing statistics to measure the effects of assists, I wanted to make the study a bit more granular to determine the effects of assists from specific areas and actions such as the elbow and out of drives.
- Figure One details the results of four correlation matrices to ensure there are no multicollinearity issues – multiple variables that are highly correlated and measuring the same phenomena. If you look at the top two charts, you will see that Post Touches and Post True Shooting Attempts are both highly correlated to their respected Paint counterparts. Because of this as well as the Paint variables being less correlated to the Elbow variables than the Post variables, I selected the Paint variables to combat multicollinearity.
- Shots made are highly correlated to shot attempts and because of that, there will be multicollinearity issues if the regression analysis consists of both variables. Similar to the previous point and as presented in the bottom two charts in Figure One, I selected the Attempts variables over the Made variables because they capture the same phenomena – the more shots you take, the more shots you make.
Figure One: Correlation Results
Below is the results of a robust Random-effects GLS regression performed on the aforementioned variables. The table does not list the results for each NBA season because that variable was a control variable and its results are of no use to the study. The tracking data variables are the relevant pieces of information.
Figure Two: Robust Random-Effects Regression Results (Year Results Not Shown)
The overall R2 for this regression is 0.7803, meaning the twenty independent variables explain 78.03% of the variation of the Team Wins dependent variable. Of those independent variables, eleven variables are statistically significant at varying confidence intervals:
Figure Three: Statistically Significant Variables & Their Confidence Intervals From Figure Two
Of these eleven variables, the z-statistics of Elbow True Shooting Attempts (-4.33) and Defended Field Goal Attempts (-1.73) stand out the most. Both values are negative, meaning the more shot attempts from the elbow and the more defended shot attempts a team does, it decreases their chances of winning basketball games. Regarding shots from the elbow, this analysis provides further support and context to the theory of “Morey Ball,” coinciding with the findings of my previous study on shooting zones.
Essentially, shots from the elbow are bad and teams should not take many of them. It’s no coincidence that three of the Top 5 NBA teams in total Elbow FGA this past season had losing records – the New York Knicks (1st), the New Orleans Pelicans (3rd), and the Sacramento Kings (5th) – while four of the Bottom 5 teams on that list had 50-plus wins – the Houston Rockets (30th), the Toronto Raptors (29th), the Boston Celtics (28th), and the Cleveland Cavaliers (27th). Granted, one of those Top 5 teams is the Golden State Warriors; however, they are elite at so many other facets that it makes up for their shot attempt total at the elbow.
Regarding the Defended Field Goal Attempts, the variable ending up with a negative z-statistic does not mean either “defending shots is bad” or “allowing open shots is good.” As detailed in this Nylon Calculus article from 2015, the “defended” part in Defended Field Goal Attempts is for all intents and purposes, statistical noise and not a good perimeter defense metric. What the Defended Field Goal Attempts variable is actually telling us is that if a team gives up fewer shot attempts, the better the chances are of winning the game; whether or not someone defended the shot was is irrelevant and similar logic does apply for the Defended Field Goal Percentage variable.
It should be noted that the defensive tracking statistics are tools better served for shots at the rim more so than perimeter shots because contesting a layup does actually affect the difficulty of the shot. Making and missing perimeter shots are more random.
Another interesting aspect of this regression is the results of the two Speed variables. Both Average Offensive Speed and Average Defensive Speed have two very low z-statistics in terms of statistical significance. They do not have the lowest values, yet unlike the different assist variables, passing does need to be controlled for when discussing what basketball phenomena influences winning. I removed the speed variables and ran another regression without them and the results are below.
Figure Four: Robust Random-Effects Regression Results (Year Results Not Shown)
There were no drastic changes to the study, but rather subtle ones that are quite important. The overall R2 for this second regression is 0.778, meaning the eighteen independent variables explain 77.8% of the variation of the Team Wins dependent variable. Of those independent variables, twelve of them are statistically significant at varying confidence intervals:
Figure Five: Statistically Significant Variables & Their Confidence Intervals From Figure Four
The new statistically significant variable of this second regression is Paint True Shooting Percentage, significant at the 90% confidence interval. What is interesting about this regression is that despite the overall R2 decreasing slightly, the z-statistics for the statistically significant variables slightly increase. For example, in the first regression, the Paint True Shooting Attempts variable is within the 95% confidence interval with a z-statistic of 2.55, whereas in the second regression the variable is within the 99% confidence interval with a z-statistic of 2.79.
In both regressions, the variables that have the most intrigue are the different assist percentage variables. We know from my analysis on general NBA data is that assists are statistically significant and contribute to winning basketball. The only statistically significant assist variable of the three in both regressions is Elbow Assist Percentage, while the Drive Assist Percentage has the lowest z-statistic value in both regressions.
What makes this finding of the Drive Assist Percentage variable being statistically insignificant is that when you isolate it in a regression and only controlling for defense, the variable becomes statistically significant. So, what the heck is occurring here? Essentially, when you add more explanatory variables to a regression to control for more phenomena, certain variables are going to explain the variation of the dependent variable more than others. This happens to be the case for generating assists off drives to the basket.
Different regressions prove other variables to be statistically significant, like Drive True Shooting Percentage, and have a stronger relationship in explaining the variation of a team’s win total even when adding more control variables. When it comes to driving, shooting a high percentage is more important than generating an assist.
The Elbow Assist Percentage variable being statistically significant at the 90% confidence interval in both analyses may very well be the most interesting finding in this study because of how taking shots from the elbow is detrimental to a team’s chances of winning. This area of the court is a prime spot on the court for teams to generate assists.
The spacing allows for teams to get those catch & shoot looks and shots in the paint that, per the findings of this study, are vital to a team’s success. What two teams happen to have the highest assist percentage from the elbow this past season? The Golden State Warriors and the Cleveland Cavaliers. What two teams had the best effective field goal percentage in catch & shoot situations this past season? The Golden State Warriors and the Cleveland Cavaliers. Who were two of the Top 5 teams in field goal percentage in the paint this past season? I think you catch my drift.
The last question that remains is what should be the main takeaway or takeaways from the results of this study? The NBA is currently in a “Golden Age,” so to speak, of using analytics to help dissect and understand the game of basketball. Being able to perform a regression on specific data, or any sort of test for that matter on different shot types, passing from different areas on the court, contested rebounding, etc. to determine what aspects are more important than others allows fans of the game to have a greater appreciation for it.
Knowing the impact that catch & shoot efficiency and volume have on winning generates a more in-depth and fun discussion amongst fans when discussing their team’s strengths and weaknesses in that aspect. Furthermore, tracking data is extremely useful information because these rather specific aspects of basketball explain almost 80% of the variation of a team’s win total over the past four seasons. Hopefully, NBA.com will continue to provide tracking data for years to come because it would be fun to see the results of a robust Random-effects GLS regression with ten years of tracking data instead of four.