A few days ago, Conor Tompkins of Null Hypothesis Hockey tweeted out an interesting set of graphs showing the correlation between a goalie’s save percentage in each of the War-On-Ice danger zones and their overall success rate. Conor found that (unsurprisingly) a goalie’s performance on high danger shots was most closely correlated with overall success, with medium shots having slightly less influence, and low danger shots showing almost no relationship. While Conor’s model focused on correlations within the same season, Sam Ventura suggested that a useful extension would be to look at how well the danger zone save percentages predicted future overall save percentages. After all, if performance on high danger shots is most critical for a goalie in determining his current season save percentage, it stands to reason that this would also be a key predictor of future success.
One way we can look at this is to run a multiple linear regression between a goalie’s current season save percentage and his past save percentages broken down by danger zone. We’ll focus on 5v5 data only to avoid the issue of varying penalty rates between teams, and look at goalies who played at least 1000 minutes in back-to-back seasons (all data from War On Ice).
Before we dive into the analysis, we need a baseline to compare results against. We can’t say that our model is useful just because the predictors are significant; we need to be sure that it makes better predictions than the most basic guesses we can make. In this instance, the simplest model that we can possibly come up with is using a goalie’s previous overall save percentage as the only variable in our analysis.
So how does our Danger Zone model compare to the Basic Model? Well, not very well it turns out. The Adjusted R^2 for our Danger Zone Model is actually lower than our Basic Model, which means that giving the model more information on how a goalie performed on shots in each zone actually gives us worse predictions of future performance than simply knowing his overall performance.
|Danger Zone Model||0.041|
One interesting thing to note about our Danger Zone Model is that a goalie’s past Low Danger Save Percentage wasn’t a significant predictor of future total Save Percentage. There’s simply too much noise and too little difference in performance in the low danger data to allow us to get any useful information out of it.
Unfortunately, even though our simple model is better, the fit really isn’t that great and it’s still not all that good at predicting how a goalie is going to do based on past season data. This isn’t really surprising – goalies are notoriously hard to predict and a single season’s worth of shot data is insufficient.
So if the danger zone breakdown doesn’t help us make better predictions, we really should just give up, right? Well, not quite. One of the issues with our first pass danger zone model was its implicit assumption that the distribution of shots between high/medium/low zones would be constant from season to season. This assumption is one that we should naturally question, and as we saw above, one that’s probably not true.
The question we need to ask then is this: how we can adjust for differences in shot distributions between years? In the past, I’ve written about how goalies who face more shots against tend to post higher save percentages, and hypothesized that the increase in shot volume was driven mainly by weaker, low probability shots which could falsely inflate a goalie’s save percentage.
If this hypothesis is true, then adding a variable to our basic model that accounts for the number of shots a goalie faced in the current period should allow us to better estimate his save percentage. And this is exactly what we see: the change in fit after adding in a goalie’s shots against is quite drastic. Our Adjusted R^2 increases from 0.059 to 0.141, and as you can see below, the data aligns much more closely with our predictions when shots against are included.
While it’s important to know that shots against impact save percentage, what we really want to test is whether differences in shot location distributions are driving this impact. To do this, we can build one more model, one that looks at predicting save percentage using a goalie’s previous year’s save percentage, as well as the number of low and medium danger shots he faced in the current year.
The fit of this model is better than our previous model that included all shots against, with an increase in the Adjusted R^2 to 0.180. While this model provides the best fit of the data we have, if we substitute past season High Danger Save Percentage for Total Save Percentage in the regression we don’t really see a significant drop in the fit (the Adjusted R^2 only decreases to 0.176).
This is critical, as it tells us that a lot of the randomness we see in standard save percentage is driven by the number of low and medium danger shots a goalie faces. If we have that information, we’re essentially just as good at guessing his save percentage using High Danger Save Percentage as we are with total Save Percentage. In other words, when we look at data for a single season, the key piece of information we need is high danger save percentage – the medium and low danger data often has too much noise to find a netminder’s talent in.
All of this is reassuring, of course, because it makes perfect logical sense. Shots from the low and medium danger zone often go through a few sets of legs or are deflected before they get to the net. Given that the slightest touch off the skate blade of a teammate can be the difference between an easy save and an OT winner, we shouldn’t expect a goalie’s performance on shots from these zones to be good indicators of underlying ability.
The other take away message here is that shots against do matter when we’re evaluating goaltending, because all things being equal, a goalie who faced more shots likely faced easier shots to save. This may not be a universal rule, and we should still look to measures like Adjusted Save Percentage to account for differences in shot distribution, however focusing on metrics like Goals Saved Above Average may overvalue the contributions of netminders who face more shots.