In the (Most Efficient) Zone

Scroll this

We are told that “Moreyball” is the way NBA teams should play. Let’s see if the numbers support these claims.

Shooting has never been as important as it currently is in NBA. If you’re a perimeter player in the league who cannot shoot, especially if you’re a guard, you better be able to contribute on defense at an elite level, create space via off the ball movement, and make smart, quick passes; otherwise, you’re going to out of the league real quick. If you’re a big man who has 3-Point range, you should be able to be at least a rotational player, even if you’re not a good defender whatsoever. Essentially, there is a role for players in this league if they can consistently knock down shots from anywhere on the court.

And in an era where the Houston Rockets are averaging 40 3-Point field goal attempts per game, where a player shoots from on the court hasn’t been more important than it is now. With the realization that 3 > 2, there has been more of an emphasis on three-point shooting than two-point shooting outside of the paint. The midrange, long two-point shot is now widely considered the most inefficient shot on the court. Why take a 20-foot shot only worth two points when you can step back just 2–4 feet (depending on where you are on the court) and take a shot worth three points? Take the risk on the extra point when the distance and difficulty of the shot aren’t that much greater.

This philosophical change in where to take shots continues to attract more and more supporters as this decade continues to progress. What it also does is force us to ask two larger questions:

  1. Does shooting in different zones on the court affect a team’s chances of winning a basketball game?
  2. And if so, which are the best zones?

To address these questions, I performed a Random-effects GLS regressions on the per game shooting data from the 1996–97 to 2015–16 seasons minus the two lockout seasons (will explain why later) provided on NBA.com to determine what shooting zones impact a team’s win total for a given season.

Now, there may be a number of you reading this right now are asking yourselves, “What the bleep is a random-effects GLS regression?” So, instead of just throwing out random numbers to prevent you from getting confused, I’m going to explain what a regression analysis is, do a quick FAQ on related topics, and explain my methodology for this analysis. Following this will be the results of the study performed. If you are someone who is familiar with my work, the following FAQ will sound familiar, as much of it is from a previous article I wrote back in January about defensive zones. I made some tweaks to the explanations that hopefully will both easy to understand for everyone while maintaining the integrity of the definitions.

What is a Regression Analysis?

The decision of which variable you call X and Y matters in regression, as you will get a different result if you swap the two variables. When you run a regression, you are trying to explain the variation in the dependent variable with a set of independent variables. Furthermore, a regression analysis allows you to control for other variables, whereas a correlation is only between two variables.

What is a Random-effects GLS Regression?

This is a type of regression analysis used on “panel data,” which is something that needs a definition in its own right. Panel data is a dataset in which the behavior of entities is observed over a set time period, or across time generally. This type of data allows you to control for variables you cannot observe or measure like cultural factors, a difference in business practices across companies, or variables that change over time but not across entities. The performance of NBA teams since the 1996–97 season clearly fits this description.

There are two types of methods to analyze panel data that I will discuss: fixed effects and random effects regression. Statistician Andrew Gelman provides five key definitions for the two effects that help their differences. Here are the two simplest definitions:

(2) Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth.

(3) “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random.” (Green and Tukey, 1960)

Despite the study consisting of all NBA teams, the two sample sets only cover a fraction of all of the NBA seasons played. Furthermore, as Princeton University Data Services Specialist Oscar Torres-Reyna explains, “If you have reason to believe that differences across entities have some influence on your dependent variable then you should use random effects.”

To determine if the panel data needs a fixed effects or random effects model, a Hausman test needs to be performed. The key figure to look at for a Hausman test is the Prob > chi2 figure (in a Stata output); if the figure is less than 0.05, then use the results of a fixed effects regression. The Prob > chi2 figure for this specific study is 0.9944, a figure that is significantly larger than 0.05 and indicates that the results of the random effects regression model are the results to use.

The Methodology

As previously mentioned, I collected shooting data by zone between the seasons of 1996–97 and 2015–16 via NBA.com. I removed the lockout seasons from the study because a full 82 game season was not played. This would have skewed the results because the dependent variable of this study is a team’s Wins for the season. Playing fewer games means fewer wins.

The explanatory/independent variables of this study are the following:

  • Restricted Area Field Goal Attempts
  • Restricted Area Field Goal Percentage
  • In the Paint (non-RA) Field Goal Attempts
  • In the Paint (non-RA) Field Goal Percentage
  • Midrange Field Goal Attempts
  • Midrange Field Goal Percentage
  • Left Corner 3-Point Field Goal Attempts
  • Left Corner 3-Point Field Goal Percentage
  • Right Corner 3-Point Field Goal Attempts
  • Right Corner 3-Point Field Goal Percentage
  • Above the Break 3-Point Field Goal Attempts
  • Above the Break 3-Point Field Goal Percentage
  • Defensive Rating
  • Each NBA season in the study

A few things need to be addressed before we continue. First, the broad question being asked is: does where a team shoot on the court explain the variation in said team’s win total? Another way to phrase this question is: what zones on the court do teams need to shoot in order to win? Second, there are more specific questions being asked such as, when holding all variables constant, how much variation does restricted area field goal percentage explain? Third, Defensive Rating is in this study to control for the other side of the game. Fourth, each NBA season is regressed to take into account the specific results of each season.

And fifth, if you noticed, I did not use field goals made, but rather field goal attempts. During preliminary testing, field goal attempts and field goals made were highly correlated with one another. If independent variables are highly correlated, it creates inaccurate results messes up the entire regression analysis. I selected shot attempts over shots made because the shots made variables were more correlated with the shooting percentage variables than field goals attempted.

Figure One: Correlation Results

The data above consists of two Pearson Correlation Matrix performed, one detailing the relationship amongst midrange field goals made, attempted, and percentage (MIDFGM, MIDFGA, & MIDFG%) and the other detailing the relationship amongst above the break field goal made, attempts, and percentage (ATBFGM, ATBFGA, & ATBFG%). The measurement system for a Pearson Correlation Matrix is based on a scale of ‘+’ through ‘-‘ 1, with +1 being a complete positive correlation, -1 being a complete negative correlation, and 0 being no correlation.

As you can see, the field goals made variables are more correlated to the field goal percentage variables compared to the field goal attempts variables. When developing a study like this, you want to have your independent variables measure a specific entity and not have multiple variables measuring that same entity. Doing this creates an inaccurate regression analysis. Furthermore, the correlation relationship between shots attempted and shots made is both a positive one and significant, meaning when shot attempts increases, so do shots made (and vice versa). Essentially, the field goal attempts variables represent both volume shooting and volume scoring, while the field goal percentage variables represent efficient scoring.

The Results

A regression output is rather confusing for those who have never read one before, so I will both provide the most prevalent information and simple explanations to what they mean. Below is the table of information from the regression analysis:

Figure Two: Robust Randon-Effects Regression Results (year Results Not Shown)

There are a lot of numbers here, so let’s breakdown what’s going on here:

  • 533 observations were analyzed in this regression.
  • R2 shows the amount of variance of Y (the dependent variable Wins) explained by X (the independent variables). For this study, the overall R2 value is 0.829, which means that the fourteen independent variables explain 82.9% of the variation of a team’s win total for that season.
  • Coefficients are the values of the independent variables indicating how much Y changes when X is increased by one unit. More on this in a bit.
  • The Robust Standard Error figure does two things. First, a standard error is a measure of the statistical accuracy of an estimate; the lower the value the more accurate. “Robust” means that the independent variables were controlled for heteroskedastic data points, or outliers.
  • The z-statistic is used to test whether a given coefficient is significantly different from zero. The threshold for z-statistics to be statistically significant is “+” or “-” 2. The larger the z-statistic, the larger the relevance and impact it has on the dependent variable.
  • P > z shows the 2-tailed p-values used in testing the null hypothesis that the coefficient (parameter) is 0 while using an alpha of 0.05. In other words, if an independent variable’s P > z-value is greater than 0.05, the variable is not statistically significant.

So, how do we actually translate all of this into English? Let’s use the independent variables of Restricted Area Field Goal Percentage and Midrange Field Goal Attempts as examples so you then can apply the same logic to the other variables. The Restricted Area Field Goal Percentage coefficient is 1.366194; so, for every unit increase in Restricted Area Field Goal Percentage, a 1.366194 unit increase in a team’s Win Total is predicted, holding all other variables constant. It’s P > z value is below 0.05, meaning that Restricted Area Field Goal Percentage is a statically significant independent variable. Translation, teams that shoot high percentages in the restricted area produces winning basketball.

Regarding the second example, Midrange Field Goal Attempts has a P > z value of 0.202, which is larger than the 0.05 threshold, meaning that the variable is not statistically significant for this specific study. Another way to interpret the results for this independent variable is that the number of midrange shot attempts (and in part made as well due to the high correlation) is unrelated to a team’s win total.

So you do not have to continue going back and forth, here are the statistically significant shooting-related variables in the study ranked in order of highest z-statistic:

  • Restricted Area Field Goal Percentage
  • Restricted Area Field Goal Attempts
  • Above the Break 3-Point Field Goal Percentage
  • Midrange Field Goal Percentage
  • Paint (Non-RA) Field Goal Percentage
  • Above the Break 3-Point Field Goal Attempts
  • Right Corner 3-Point Field Goal Percentage

Five of the seven statistically significant independent variables are related to percentage, while the two shot attempts variables are for the restricted area and above the break zones. The results of this study should not be surprising in the slightest, given that the teams like the Golden State Warriors lead the league in Restricted Area Field Goal Percentage (2nd in the league for Above the Break 3-Point Field Goal Percentage) and the Houston Rockets are a Top-5 team in Restricted Area Field Goal Attempts and Above the Break 3-Point Field Goal Attempts (Rockets lead the NBA in this category), per NBA.com — teams that embrace analytics.

Another interesting bit of information in this study is that the midrange zone is not as dead or unimportant as currently portrayed. Long 2-Point shots are what teams want their opponents shooting, but making those shots at a high percentage from midrange does increase a team’s chances in winning. It’s no coincidence that most of the teams in the Top-10 in midrange field goal percentage are either in the playoffs or were fighting to get in (the Knicks are the only exception and ranked 6th in midrange FG%, slightly ahead of both the Cavaliers and the Spurs). The Warriors just so happen to be ranked 1st in midrange shooting percentage.

If you want to have at least one takeaway from the results of this study, it is a rather intuitive one, especially if you follow the NBA closely: the more efficient a team’s offense is, in terms of having a high shooting percentage, that team more often than not will be one of the best teams in the league. Two other important takeaways should also be that the numbers do support “Moreyball” (floor spacing and an emphasis on restricted area and 3-point shots) and that the midrange isn’t this taboo region on the court, but a team needs to have the right personnel to exploit it, i.e. have elite midrange players.

Submit a comment