Shot Location Data and Strategy II: Evaluating Individual Defensive Play

This is the second post in a (likely) 3 part series going through the data/methods/results which I presented at the Pittsburgh Hockey Analytics Workshop. Part I, which covers whether defencemen play worse on their off-hand, is available here. If you’re interested in seeing the slides or hearing the presentation (or the other presentations, which I highly recommend), they’re available on the WaR on Ice website here.

As anyone who has ever dug into the data around defensive zone play knows, evaluating an individual player’s contribution in his own end is a difficult task. Most defensive metrics that we have today show far less year-over-year repeatability than offensive metrics, suggesting that they’re more likely measuring team or system effects than individual abilities. For defencemen, who tend to drive offensive play much less than forwards, this presents a particularly tricky challenge, as if we are unable to isolate their ability to defend their net from the opposition we aren’t left with much to judge them on.

Part of this challenge is just the nature of the data that’s recorded: when a player takes a shot attempt we note who took the shot, and for goals we also note the players that assisted as well. Both of these data points, while far from perfect, give us a better ability to differentiate the players who are driving the bus in the offensive zone from the skaters that are simply along for the ride. At the other end of the rink though we don’t have the same luxury: we know who took a shot against, but no one collects information on which defender was closest to the shooter, or who was (in theory at least) responsible for defending him. In an ideal world we’d have dozens of scorers or assistant coaches writing this data down, or better yet we’d have an intricate series of cameras available to track all of this information automatically (we can dream, right?), but even if this data is or will be collected it’s unlikely to ever make it into the public sphere.

While I may be painting a bleak picture here, the situation isn’t completely hopeless. The NHL’s game files do contain information on the location that each shot was taken from, and (with a bit of effort), we can leverage this data to get a better sense of an individual defenceman’s efforts at his own end of the rink. This is important after all, because if a GM is thinking about adding a shutdown defenceman at the trade deadline to make a playoff push, we want to know whether he’s actually preventing shots himself or whether his partner is doing the heavy lifting. While we can use our current metrics figure out whether a player generally allows more or fewer shots when he’s on the ice, we can’t really get a sense as to whether it’s due to his own efforts or not at first glance. And it’s with that thought in mind that I’m going to present an initial attempt at modelling a defenceman’s individual shot prevention ability in his own zone.

The first step to figuring out defensive zone coverage though is knowing which players are playing on which side of the ice. In Part I, I introduced a new measure called “Side Bias” which measures a player’s propensity to take shots from the left or right side of the ice.

Side Bias = (# of Shots Taken from Left Side) / (# of Shots Taken from Left or Right Side) – 50%

Side bias relies on the fact that the left defenceman or winger will tend to take the majority of their shots from the left side of the ice, while the opposite will be true for right defencemen/wingers. Shot bias is the numerical formulation of this idea, players with a Shot Bias above zero tend to take more shots from the left side, while players with a Shot Bias below zero are shooting from the right side more frequently. These numbers are incredibly useful because they allows us to determine which player was playing LD/LW or RD/RW without observing each game or having an in-depth knowledge of a coach’s lineup preferences.

To apply these numbers to classify defensive pairings, we’ll use the same rules we did in Part I:

  • If a pairing has played together a significant amount (which I’ve defined as at least 10 shots taken by each player while they were on the ice together), we’ll use the side bias data from when they played together.
  • If a pairing has rarely played together (if either player has taken less than 10 shots when the pairing were on the ice together), we’ll use their overall side bias numbers to figure out which player was on which side.

This may not be a perfect system, as we’ll likely make a few mistakes with pairings who play together infrequently, but since these will form a small subset of any given players overall sample the wrong numbers won’t have a significant effect in the long run.

So how do we turn our knowledge of who’s playing on what side of the ice into a measurable result for defenceman? The simplest method is to just divide the defensive zone in half up the middle and assign each defenceman responsibility for defending their half. All the shots taken from the left side of the ice are the responsibility of the left defenceman, while all the shots taken from the right side are a negative mark against his partner. It’s a simple model, but as we’ll see later, it does provide a half decent view of individual defensive zone play.

One thing I should address before we move any further is the accuracy of the NHL’s shot location data. While many have noted issues with the league’s data, it’s unlikely to have a major impact on our results since we don’t expect there to be significant (or any, really) bias in the side of the ice that scorers place a shot on. In other words, while we know some scorers may tend to record shots from closer than they actually occurred, there’s not really any reason to think that any scorers are moving shots to the left or right side systematically. So while in small samples we may have problems with a shot being misclassified, over the long run we expect these to even out and have a net zero effect on any individual defenceman’s results.

With that out of the way, what can we learn about a defenceman when we look at the divided-ice numbers? Well the first, and simplest, calculation we can do is to figure out what percentage of the shots against are coming from his side of the ice. In my presentation I called this the % Shots Against From Side but Defensive Side Shots % or DSS% is a more concise and just as accurate name that I’ll use going forward. For a left defenceman, the calculation is just:

DSS% = # of shots from right side of ice/(# of shots from right side of ice + # of shots from left side of ice)

Note that when I say “right side” in this context it’s from the point of view of the shooter, so although a left defenceman is covering the left side of the ice, he’s responsible for the attacking right winger/right defenceman (and any other player who crosses to the attacking right side). The benefit of this metric is that it gives us an idea of which member of a pairing is defending their side better on a relative basis. For example, Marc-Edouard Vlasic hasn’t allowed more than 48.8% of the total shots against to come from his side of the ice since 2010-2011, one of the best records of any defencemen in the league.

Season Player DSS%
20102011 Marc-Edouard Vlasic 46.1%
20112012 Marc-Edouard Vlasic 48.8%
20122013 Marc-Edouard Vlasic 42.4%
20132014 Marc-Edouard Vlasic 45.7%

The weakness in DSS%, however, is that it really only compares a defenceman to his partner(s). If you put me out there alongside Vlasic, his numbers would likely look even better, as most forwards would take advantage of my inability to keep up with the pace of NHL play. In order to correct for this, we need to start taking into account the offensive side of play as well, to give us a sense as to whether a defenceman is defending his side of the ice well relative to the number of shots his team is taking.

To do this, we can calculate a player’s Defensive-Zone Adjusted Shots For % (or DZA-SF%):

DZA-SF% = Shots For / (DSS% * Shots Against + Shots For)

Edited on Dec. 30th to update the adjustment for defensive side. This new formula better reflects the number of shots against a player faces as it includes shots from the exact middle of the ice (y=0).

This number actually gives us a better view as to the complete contribution of a defenceman. While we may think that a player like Kevin Shattenkirk isn’t pulling his weight if we look only at his last 3 seasons of DSS% (which were all over 52.5%), we also see that his DZA-SF% is above 53% in each of those years as well, suggesting that his weakness in his own zone is more than made up for at the offensive end of the rink.

If we look at the best DZA-SF% seasons since 2010 a few things pop out pretty quickly. First off, all these players posted a solid SF% to begin with. Second, the effect of preventing shots on your own side can have a somewhat significant impact on our evaluation of a player. Of the 10 players listed below, only Jake Muzzin and PK Subban in 2012-2013 posted a DSS% over 50%. And for players like Anton Stralman last year or Matt Greene in 2011-2012, their defence of their side is enough to boost them from very good to elite.

Season Player DSS% DZA-SF% SF%
20132014 Michal Rozsival 48.6% 61.5% 60.7%
20132014 Marc-Edouard Vlasic 45.7% 61.4% 59.0%
20122013 Lubomir Visnovsky 47.6% 61.0% 59.6%
20122013 Jake Muzzin 53.1% 60.7% 62.3%
20132014 Jake Muzzin 46.3% 60.2% 58.4%
20132014 Anton Stralman 43.9% 59.9% 56.6%
20112012 Nicklas Lidstrom 46.8% 59.0% 57.5%
20122013 P.K. Subban 50.0% 58.8% 58.7%
20112012 Matt Greene 45.4% 58.6% 56.2%
20112012 Zdeno Chara 46.3% 58.6% 56.7%

If we look at the bottom end of the chart we see that the worst names show the same themes but in the opposite direction: it’s primarily players who are either woefully bad at defending their own side (Mike Kotka and Jack Johnson) or players who struggle to generate any offense while putting up respectable DSS% (Mike Weber and Andrea Lilja). Morgan Rielly’s name is the only name on the list that really stood out to me – while traditional hockey thought says that offensive defencemen tend to be worse in their own end because they take more risks to generate that offense, that doesn’t necessarily appear to be the case when you take a quick look through the data. Since 2010, the trio of Drew Doughty, Erik Karlsson and P.K. Subban have combined to post only 3 seasons where their DSS% topped 50%. While some of Rielly’s numbers are obviously driven by the fact that he played for the Leafs last year, his 53.1% DSS% seems to be more the exception than the rule when it comes to offensive defencemen.

Season Player DSS% DZA-SF% SF%
20122013 Michael Kostka 57.3% 40.1% 43.4%
20122013 Jack Johnson 56.0% 40.2% 43.1%
20102011 Clayton Stoner 55.0% 40.3% 42.8%
20132014 Mike Weber 51.3% 40.8% 41.5%
20102011 Cam Barker 53.7% 40.8% 42.7%
20102011 Andreas Lilja 49.9% 40.9% 40.9%
20102011 Keith Aulie 52.2% 41.1% 42.3%
20112012 Marco Scandella 52.6% 41.3% 42.6%
20132014 Morgan Rielly 53.1% 41.5% 42.8%

What’s also interesting to look at is the delta between standard SF% and DZA-SF%, as this allows us to identify players who may be under or overvalued by traditional possession metrics that don’t attempt to isolate their defensive zone ability.

Season Player DZA-SF% SF% Delta
20102011 Mike Green 57.7% 52.3% 5.4%
20102011 Jared Spurgeon 51.0% 45.8% 5.1%
20122013 Marc-Edouard Vlasic 56.0% 51.7% 4.3%
20112012 Nate Prosser 48.4% 44.2% 4.2%
20132014 Bryce Salvador 53.2% 49.0% 4.2%
20122013 Justin Faulk 51.8% 48.0% 3.9%
20112012 Mike Weaver 49.9% 46.0% 3.9%
20132014 Robert Bortuzzo 52.6% 48.7% 3.8%
20122013 Sergei Gonchar 54.2% 50.4% 3.8%
20102011 Nicklas Grossmann 50.4% 46.6% 3.8%

Obviously, uber-defender Marc-Edouard Vlasic appears as one of the biggest gainers, but so does Mike Green, who has actually had a fair amount of success in the defensive zone, posting 2 seasons in which he received more than a 2.5% boost in SF% from including his defensive zone stats. But we can also see more than a few examples of players who actually go from being below to above average players when we bring their side-defending prowess into the equation. Jared Spurgeon, Bryce Salvador, Justin Faulk, Robert Bortuzzo and Nicklas Grossmann all go from being sub-50% players to above-average when we incorporate DSS% into our analysis.

Season Player DZA-SF% SF% Delta
20102011 Michael Del Zotto 47.5% 51.1% -3.6%
20112012 Derek Smith 43.5% 47.1% -3.6%
20112012 Steve Montador 50.6% 54.0% -3.4%
20122013 Michael Kostka 40.1% 43.4% -3.3%
20132014 Brooks Orpik 45.5% 48.8% -3.3%
20112012 Philip Larsen 47.4% 50.7% -3.3%
20122013 Ryan Suter 47.9% 51.1% -3.3%
20102011 John Erskine 48.6% 51.7% -3.0%
20112012 Tomas Kaberle 42.7% 45.8% -3.0%
20102011 Deryk Engelland 45.2% 48.2% -3.0%

At the other end of the spectrum, there’s a few interesting names to look at here: the first, is Brooks Orpik, who I took a few digs at during my talk in Pittsburgh. Orpik has generally been one of the worst players at defending his side, posting only one year in the past 4 in which he allowed fewer shots from his side. In fact, Orpik has gotten progressively worse each of the past 4 seasons, going from being a positive DZA-SF% and near 50% DSS% player in 2010/11 and 2011/12 all the way down to a sub 46% DZA-SF% player each of the past 2 years while allowing more than 57% of the shots from his side of the ice last year.

The other name that stands out to me here in Ryan Suter’s – believe it or not, Suter has actually been worse than Orpik at defending his own side, never once posting a sub-51% DSS% and going over 56% each of the past two years. And in spite of the fact that he’s been a positive SF% player 3 of the last 4 seasons, only one time were his adjusted numbers on the right side of 50%. The one thing Suter does seem to have going for him is that he generates a fair amount of shots on goal himself. While I won’t dive into it too much here, Suter is actually a middle of the pack player when you take the ratio of his individual shots to his defensive side shots against. Obviously, that may not be enough to justify a 7.5MM cap hit, but it may help to explain why he’s playing top pairing minutes when analytically he seems to be somewhat suspect in the defensive zone.

While all of these stats are interesting enough on their own, if we’re going to use DZA-SF% as an evaluation tool we need to know how repeatable it is year-over-year. After all, it doesn’t do us a lot of good to create a metric that shows that a given player is good or bad if we can’t use that metric to make predictions about how that player will do in subsequent years. And luckily for us, DZA-SF% shows a reasonable repeatability within our sample, with a correlation between year Y and Y + 1 of 0.43. This is actually slightly higher than the year-over-year correlation of our unmodified SF% (0.42), and not much lower than the year-to-year correlation in CF% for defencemen (0.47).

What is really interesting to me though is that decreasing our sample size by approximately 25% hasn’t significantly reduced the repeatability of our metric. Normally, if I told you I was only going to use 60 games to evaluate a player you would assume that the results we’d get wouldn’t be as accurate as if we used the whole season’s data. But what our correlation numbers show is that even after cutting out roughly half of the shots against data we have we’re not any worse at predicting a player’s performance in future years. Which really is the best validation of DZA-SF% you could ask for – as we get more and more data, we should expect the metric to become even more accurate at measuring individual performance. But even with the data we have now, we know that our predictions aren’t any worse, which is obviously a huge win for our metric.

While DZA-SF% may be a crude metric in its design, it’s important to remember that even in other sports (where analytics have made far greater strides than they have in hockey) defensive metrics are often broad estimates when compared to the relative precision that we’re able to use for offensive metrics. Within baseball, the sport for which analytics has made arguably the greatest impact, defensive stats often disagree on the value of a player, but that’s not to say that they have no value. Even if UZR and Defensive Runs Saved don’t always line-up, if an analyst knows how each are calculated, and the strengths and weaknesses of each approach, he or she can make a subjective evaluation of each stat’s “opinion” and use that to inform his or her view on a player. The idea that we can simply divide the ice in half and assign a defenceman responsibility for all shots on one half is obviously wrong in some cases, but unless we have specific individual reasons why it wouldn’t make sense for a given player, isn’t it better than a subjective evaluation? It’s not meant to be the only thing that one should look at when evaluating a player (because there really isn’t ever going to be a single number, nor a single subjective quality that we should look for to make personnel decisions), but it does give us a way to say whether a player is good or bad outside of relying solely on popular opinion.

What’s also critical to know is that even without player tracking technology or a whole team of individuals recording new data for us, our estimates using a methodology like this should get better simply with more detail in the data that the NHL is already providing us. Greg Sinclair has already pointed out that co-ordinate data is available for all events this season (see here for an example), and the simple inclusion of all shot attempts should give us even better results to work with. Even if full game tracking is still a ways off, we can still get better just by pulling more data into this methodology.

There are also several improvements that could be made to this approach with a little technical effort and thought. Obviously giving a defenceman responsibility for an entire side of the ice is a huge over-simplification, and even adding wingers who are responsible for the opposition’s pointmen into the analysis should yield better results. In addition, in his review of the PGH Analytics Workshop at Hockey Prospectus, Arik Parnass suggested a novel way to include zone entry data in our methodology to further refine our view of defensive zone responsibility. There are obviously lots of ways to make this better, it’s simply a matter of further focussing our approach to better align to how we know the game of hockey works.

DZA-SF% isn’t meant to be a be-all end-all stat, nor is it meant to present the “right way” to evaluate defencemen, rather what I’m hoping to do is illustrate a technique to make better use of the data we have available to us right now. More data will obviously make more complex analyses easier, but that doesn’t mean there’s not more we can squeeze out of the data we have right now. There’s still lots to be learned from the basic data included in the NHL’s RTSS and JSON files, it’s simply a matter of digging a bit deeper into the data and putting thought into how it connects to what we know about how hockey works. It’s these insights and analyses that will start to chip away at the argument that hockey is too fluid and too complex to measure, and provide us with more reliable methods to understand the impact of a given player.

Tagged with: , ,
Posted in Defence, Statistics

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: