# Predicting Save Percentage: Dangers Zones and Shot Volumes

A few days ago, Conor Tompkins of Null Hypothesis Hockey tweeted out an interesting set of graphs showing the correlation between a goalie’s save percentage in each of the War-On-Ice danger zones and their overall success rate. Conor found that (unsurprisingly) a goalie’s performance on high danger shots was most closely correlated with overall success, with medium shots having slightly less influence, and low danger shots showing almost no relationship. While Conor’s model focused on correlations within the same season, Sam Ventura suggested that a useful extension would be to look at how well the danger zone save percentages predicted future overall save percentages. After all, if performance on high danger shots is most critical for a goalie in determining his current season save percentage, it stands to reason that this would also be a key predictor of future success.

One way we can look at this is to run a multiple linear regression between a goalie’s current season save percentage and his past save percentages broken down by danger zone. We’ll focus on 5v5 data only to avoid the issue of varying penalty rates between teams, and look at goalies who played at least 1000 minutes in back-to-back seasons (all data from War On Ice).

Before we dive into the analysis, we need a baseline to compare results against. We can’t say that our model is useful just because the predictors are significant; we need to be sure that it makes better predictions than the most basic guesses we can make. In this instance, the simplest model that we can possibly come up with is using a goalie’s previous overall save percentage as the only variable in our analysis.

So how does our Danger Zone model compare to the Basic Model? Well, not very well it turns out. The Adjusted R^2 for our Danger Zone Model is actually lower than our Basic Model, which means that giving the model more information on how a goalie performed on shots in each zone actually gives us worse predictions of future performance than simply knowing his overall performance.

 Model Adjusted R^2 Basic Model 0.059 Danger Zone Model 0.041

One interesting thing to note about our Danger Zone Model is that a goalie’s past Low Danger Save Percentage wasn’t a significant predictor of future total Save Percentage. There’s simply too much noise and too little difference in performance in the low danger data to allow us to get any useful information out of it.

Unfortunately, even though our simple model is better, the fit really isn’t that great and it’s still not all that good at predicting how a goalie is going to do based on past season data. This isn’t really surprising – goalies are notoriously hard to predict and a single season’s worth of shot data is insufficient.

Observed Save Percentage vs. Expected Save Percentage (using Past Save Percentage)

So if the danger zone breakdown doesn’t help us make better predictions, we really should just give up, right? Well, not quite. One of the issues with our first pass danger zone model was its implicit assumption that the distribution of shots between high/medium/low zones would be constant from season to season. This assumption is one that we should naturally question, and as we saw above, one that’s probably not true.

The question we need to ask then is this: how we can adjust for differences in shot distributions between years? In the past, I’ve written about how goalies who face more shots against tend to post higher save percentages, and hypothesized that the increase in shot volume was driven mainly by weaker, low probability shots which could falsely inflate a goalie’s save percentage.

If this hypothesis is true, then adding a variable to our basic model that accounts for the number of shots a goalie faced in the current period should allow us to better estimate his save percentage. And this is exactly what we see: the change in fit after adding in a goalie’s shots against is quite drastic. Our Adjusted R^2 increases from 0.059 to 0.141, and as you can see below, the data aligns much more closely with our predictions when shots against are included.

Observed Save Percentage vs. Expected Save Percentage (using Past Save Percentage, Current Shots Against)

While it’s important to know that shots against impact save percentage, what we really want to test is whether differences in shot location distributions are driving this impact. To do this, we can build one more model, one that looks at predicting save percentage using a goalie’s previous year’s save percentage, as well as the number of low and medium danger shots he faced in the current year.

Observed Save Percentage vs. Expected Save Percentage (using Past Save Percentage, Current Low and Medium Shots Against)

The fit of this model is better than our previous model that included all shots against, with an increase in the Adjusted R^2 to 0.180. While this model provides the best fit of the data we have, if we substitute past season High Danger Save Percentage for Total Save Percentage in the regression we don’t really see a significant drop in the fit (the Adjusted R^2 only decreases to 0.176).

This is critical, as it tells us that a lot of the randomness we see in standard save percentage is driven by the number of low and medium danger shots a goalie faces. If we have that information, we’re essentially just as good at guessing his save percentage using High Danger Save Percentage as we are with total Save Percentage. In other words, when we look at data for a single season, the key piece of information we need is high danger save percentage – the medium and low danger data often has too much noise to find a netminder’s talent in.

All of this is reassuring, of course, because it makes perfect logical sense. Shots from the low and medium danger zone often go through a few sets of legs or are deflected before they get to the net. Given that the slightest touch off the skate blade of a teammate can be the difference between an easy save and an OT winner, we shouldn’t expect a goalie’s performance on shots from these zones to be good indicators of underlying ability.

The other take away message here is that shots against do matter when we’re evaluating goaltending, because all things being equal, a goalie who faced more shots likely faced easier shots to save. This may not be a universal rule, and we should still look to measures like Adjusted Save Percentage to account for differences in shot distribution, however focusing on metrics like Goals Saved Above Average may overvalue the contributions of netminders who face more shots.

Tagged with: , ,
Posted in Goaltending, Theoretical
###### 14 comments on “Predicting Save Percentage: Dangers Zones and Shot Volumes”
1. […] The guys at War On Ice have also addressed the topic in their blog series The Road to WAR, and Puck Plus Plus has some commentary on the predictive power. They’re all well worth your time in reading if you’re interested in […]

2. […] can be difficult to evaluate, and their performances from year-to-year fluctuate, but the prevailing school of thought at the moment is it’s best to evaluate goalies using 5v5 adjusted save percentage and/or 5v5 high-danger […]

3. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

4. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

5. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

6. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

7. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

8. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

9. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

10. […] includes only shots from the slot. This is the part of adjusted save percentage that appears to predict future performance best and thus may tell us the most about individual goaltenders. More information is available […]

11. […] danger categories. The high danger category is of particular importance to check out, as it is more repeatable and predictive than the other categories, which appear to be more or less […]

12. […] high danger category is of particular importance, as it is more repeatable and predictive in terms of sv% than the other categories, which appear to be more or less […]

13. […] can refine our results even further, and look at high-danger save percentage on its own, in order to remove some of the noise in low danger and medium danger shots. Crawford ranks second […]

14. […] into save percentage by danger zone. A goalie’s performance on high danger shots is more indicative of talent level, while save percentage on medium danger and low danger shots is mostly […]