How much skill exists in on-ice shooting percentages?

Earlier today, Phil Birnbaum posted a piece offering further arguments in favour of his view that shot quality exists and is a legitimate strategy choice for teams.  For those of you unfamiliar with Phil’s work, he’s long been a proponent of the idea that teams can sustain high shooting percentages, and while I’m not necessarily in agreement with all of his theories, the points he makes are generally well thought out and argued (all of which is to say that regardless of where you stand on the topic you should read what he’s written to date).

In his article Phil made two main arguments: first, that players have the ability to increase their teammates shooting percentage while they’re on the ice; and second, that because of this we can conclude that team shooting percentage isn’t random. I’m not going to dig too much into the second argument, as that’s not something that’s ever really been argued by any analyst (most claim that while shooting is a skill, at the team level the differences are frequently so small that they’re negligible), and I think that the salary cap structure and the fact that teams tend to pay for past Sh% may explain why the differences we observe are small.

I do, however, want to look into Phil’s first claim, that players able to influence their on-ice shooting percentage. The main evidence that Phil offers in support of this argument is that for a set of elite players their ability to maintain a positive On-Ice Sh% WOWY persists year-over-year. While this finding certainly supports his point, it’s far from conclusive: after all, his analysis focuses on 10 players and we can’t really reach broad conclusions based on 1% of the leagues population. Furthermore, several of the players he highlights are excellent shooters themselves, and what we’re seeing in their on-ice shooting percentage may be more of a reflection of their individual shooting percentage than any goal creation ability. Lastly, all of these players are amongst the best in the league and tend to play with the other very good players: any year-over-year trends we see may be more reflective of consistent top-level teammates rather than ability.

With all that being said, I do believe that there is skill in on-ice shooting percentage, it’s simply a matter of how much talent there is compared to luck, and how persistent it is in season-by-season numbers. To investigate this, I pulled all the individual seasonal data from, and calculated each player’s Individual Corsi Sh% (iCSh% = iGF/iCF) and their On-Ice Teammate Corsi Shooting Percentage (TMCSh% = (GF – iGF)/(CF – iCF)). After filtering out anyone who wasn’t on the ice for at least 200 CF in a season, I split the data between forwards and defencemen and calculated the correlation in both iCSh% and TMCSh% over several seasonal pairs (i.e. correlation between year y and year y + x for x in [1,6]). By looking at the correlation over multiple years we should be able to figure out the extent to which similar teammates/deployment factors into each of our metrics repeatability.

Year to Year Correlations - Forward Shooting Percentages

Year to Year Correlations – Forward Shooting Percentages

Starting with the forwards, we see that our results do seem to align with Phil’s theory – the correlations for iCSh% never drop below 0.2, and average roughly 0.22 if we exclude the last datapoint which seems to be way out of line. Similarly, forwards seem to have at least some ability to influence their teammates shooting percentage, with the correlations averaging 0.18 for up to 5 year differences. And while none of the correlations that we see are phenomenal (as a point of comparison CF% generally correlates at levels between 0.33 and 0.57 over 5 years for forwards), there’s very clearly some talent there. What we see also makes a lot of sense intuitively – forwards have more control over the shots coming off of their own sticks, but we expect that they should have at least some ability to create scoring opportunities for their teammates.

Year to Year Correlations - Defencemen Shooting

Year to Year Correlations – Defencemen Shooting

For defencemen, on the other hand, we see a much different story. Defencemen show little repeatability in individual shooting ability, with the correlation generally sitting in the 0.08-0.10 range (again, with the exception of our odd 6 year delta seasons). When it comes to improving their teammates shooting, however, the picture is even worse, with the correlation dropping to roughly 0.06 for a 1 year delta, and down to basically 0 at lengths more than 4 years.

While it’s not exactly news that forwards drive offensive play more than defencemen, this really underscores how little control defencemen have once the puck leaves their sticks. This helps explain (in part at least) why it’s so difficult for a defencemen to look “dominant” for multiple seasons at a time, since so much luck exists in their year-to-year shooting percentages it’s really tough to get the bounces to go your way for multiple years at a time. And while forwards do show more control of whether the puck goes in when they’re on the ice, a single season’s data is still about 75% luck for their own shots and 80% luck for their teammates. All of which is to say that while generating and converting on scoring opportunities is definitely a skill for forwards, it can take quite some time before we know if what we’re seeing is signal more than noise. We needn’t ignore shooting percentages, but rather we need to keep in mind that bounces can take a long while to even out, so if you’re seeing names like Chris Neil near the top of the league in shooting percentage (19th at the moment!!!) it may be a sign that it’s too early to be checking the data.

Tagged with: ,
Posted in Shooting Percentage

Score Adjusted Weighted Shots

Last week, Tom Tango, Sabermetrician extraordinaire for the Chicago Cubs (and one of the original hockey analysts to, you know, actually make money doing this thing), posted an article on his site proposing a new metric to better weight the components of Corsi. Tango defined his new metric, (which he proposed with tongue planted firmly in cheek be called Tango, or failing that Weighted Shots), as a simple linear combination of goals and non-goal Corsi events:

wSH = Goals + 0.2*(Shots + Missed Shots + Blocked Shots)

The weighting of 0.2 was informed (although not strictly derived) from a regression between half-season Corsi components and half-season Goals For (i.e. calculating the weights to maximize the predictive value of future Goal Differential). Tango’s goal was to preserve the predictive information that we see in Corsi while properly taking into account the fact that goals are really what the game is all about. And while no analyst has ever argued that Corsi alone is enough to evaluate a team or player with, Tango’s point was that we had the data to make intuitive improvements to Corsi in a relatively easy manner.

One of the problems with how wSH is formulated though, is that it aimed behind the current state of Hockey Analytics. As Micah McCurdy has illustrated, Score Adjusted metrics vastly outperform standard possession metrics, since both the location of the game and the current score state have significant impacts on how teams perform. Unless we take score and venue effects into account, even an improved metric like wSH is missing important information. Fortunately, if we follow along with Micah’s original methodology, we can figure out the appropriate adjustments to bring these factors into wSH.

Using data from 2008-2014, we can first calculate the probability of a given team recording an event based on the event type, score state and game location (home/away):

Score Home Goal Away Goal Home Shot/Miss/Block Away Shot/Miss/Block
-3 52.87% 50.83% 59.25% 56.66%
-2 51.19% 50.49% 57.34% 55.31%
-1 53.20% 49.77% 54.96% 53.22%
0 52.91% 47.09% 50.95% 49.05%
1 50.23% 46.80% 46.78% 45.04%
2 49.51% 48.81% 44.69% 42.66%
3 49.17% 47.13% 43.34% 40.75%

Then, we can take the probabilities, along with Tango’s wSH weights (1 for goals, 0.2 for shots/misses/blocks) and combine them to calculate weighted adjustment factors for a Score Adjusted Weighted Shots metric (SAwSH):

Score Home Goal Weight Away Goal Weight Home Shot/Miss/Block Weight Away Shot/Miss/Block Weight
-3 0.943 0.983 0.163 0.173
-2 0.976 0.990 0.171 0.179
-1 0.936 1.005 0.180 0.187
0 0.942 1.058 0.196 0.204
1 0.995 1.064 0.213 0.220
2 1.010 1.024 0.221 0.229
3 1.017 1.057 0.227 0.237

As you can see, the value of a goal relative to a shot isn’t constant in our new method. It ranges from one goal being worth 5.78 shots/misses/blocks (for a home team down 3 goals) all the way down to 4.46 shots/misses/blocks per goal (for a visiting team up by 3).

Now that we’ve defined how to calculate SAwSH, let’s look at how well it performs compared to Score Adjusted Corsi. Whenever we evaluate a new stat there are two things we need to look at to decide how much trust to put in it: 1) the repeatability of the metric, that is how well our measurement over one period predicts the same measurement over another period; and 2) how well the metric predicts our result of interest (winning hockey games). A metric that’s not repeatable doesn’t do us much good when we’re evaluating a team or player, since we don’t know whether the results we observe are due to luck or talent. At the same time, a measure that’s highly repeatable but doesn’t relate to winning is a metric that we should just ignore.

The best way for us to test for repeatability at the team level is to look at the correlation between our results in odd-numbered games and in even-numbered games. Since there’s nothing in the game number that would relate to our results, if we see a high correlation it’s a good sign that what we’re observing is a talent.

Metric Correlation
Score Adjusted Corsi 0.873
Score Adjusted Weighted Shots 0.841

While Score Adjusted Corsi shows slightly more repeatability, the difference at this level is more or less negligible at this point. Both metrics show enough repeatability that we don’t have to worry that they’re influenced too heavily by luck. This is particularly important for SAwSH as it dispels one of the biggest worries that many people had about it, being that the inherent variableness in shooting and save percentage would mean that we’d need a much larger sample before we could trust the results.

If we move on to predictability we can run a similar test, but instead of correlating the same metric between even and odd-numbered games, we’ll look at how well our Score Adjusted numbers in even games predict a team’s Goals For Percentage in odd games (and vice-versa).

Metric Correlation (Even -> Odd GF%) Correlation (Odd -> Even GF%)
Score Adjusted Corsi 0.475 0.421
Score Adjusted Weighted Shots 0.495 0.446

In both datasets, SAwSH does a better job of predicting out of sample Goals For %. This makes sense of course, since SAwSH includes goal scoring/goaltending data where SAC doesn’t. The difference between SAC and SAwSH is also interesting to note: we seem to be able to explain ~5% more of the variance in out of sample GF% by using wSH rather than Corsi, illustrating the fact that shooting percentage and save percentage do matter at the team level. While they’re obviously not as important as possession (after all, we still do fairly well using only SAC), there’s clearly a benefit to including them our analyses.

While the computational cost of SAwSH may be slightly higher than standard CF, the benefits are more than just an increase in predictive power: wSH makes much more sense intuitively, and is a direct counterpoint to the argument that analytics are too focussed on possession. SAwSH makes a much better argument for analytics while giving up very little in the way of repeatability. While there are obviously further areas to investigate (the weightings in Tango’s original regression equations are worth a deeper look as there’s likely further value to be extracted there), SAwSH is clearly a step-forward for the analytics movement. And although some may argue that the power of SAwSH is a repudiation of Corsi as a metric, I instead look at it as a validation of possession-based analyses: the value of the sample that Corsi offers is obvious; SAwSH is just a small tweak to better reflect the inherent shooting and goaltending differences that Corsi can miss in some cases.


On his site Tango asked for correlations for Score Adjusted Goals, and so I’m happy to oblige:

Comparison Correlation
Odd(SAG%)->Even(SAG%) 0.365
Odd(SAG%)->Even(GF%) 0.366
Even(SAG%)->Odd(GF%) 0.354

Obviously, SAwSH is quite a bit better in both repeatability and predictability, but what’s more interesting is how little additional value we get from adjusting GF% for Score State/Venue. The correlation between raw GF% in odd and even games is 0.353, which means that we’re getting almost no additional information from our adjustments.

Tagged with: , ,
Posted in Statistics

Are the Buffalo Sabres worse than an AHL team?

The Buffalo Sabres are not a good hockey team. This is not news to anyone. At 6-13-2 the Sabres sit last in the Atlantic Division by 6 points, and are tied with the Carolina Hurricanes and Edmonton Oilers for least points to date across the NHL. What’s worse for Buffalo is that they’re almost certainly much worse than their record suggests. Their Pythagorean Win Percentage, which calculates a team’s expected winning percentage based on their Goals For and Goals Against (and is a better predictor of future success than regular winning percentage) sits at 20.9%, 8% lower than their actual winning percentage.

It’s not easy to describe how bad Buffalo’s 20.9% Pythagorean winning percentage is: the only teams since 1992 to achieve anywhere close to that level of futility were the 1992-93 and 1993-94 Ottawa Sentors, and they at least had the excuse of playing in the first 2 years of their franchise history. One question that’s come up a few times across the sports analytics world recently is whether or not a minor league/college team could defeat the worst professional team in a given sport. Over at FiveThirtyEight, Neil Paine concluded that even the 0-14 Philadelphia 76ers would still be about a 78% favourite over the Kentucky Wildcats. In addition, Tom Tango ran through the math for MLB on his blog, and found that a top tier minor league team could score up to 70% as many runs while allowing 143% more and win up as even-money against the worst MLB team. The natural question that follows, of course, is are the Sabres really bad enough to lose to an AHL team?

To answer this, we’ll first need to figure out how well we’d expect the best AHL team to do (from a goal differential point of view) if we moved them up to the NHL. As of Tuesday night, the top team in the AHL was the Manchester Monarchs, who have posted 57 goals for and 38 goals against while en route to a 12-4-1 start. While the Monarchs have been slightly lucky to date (their Pythagorean Win Percentage is about 70% at that Goal Differential), their obviously still a good team. But because they play in the AHL we can’t just use the Goal Differential that they’ve posted there, we have to adjust it to reflect how we feel they’d perform if we airdropped them onto an NHL rink.

Fortunately for us, someone else has done the legwork to come up with a translation factor already! NHLe (NHL Equivalency) is a stat first created by Gabe Desjardins, and its purpose is to allow us to convert the number of points a player scored in a non-NHL league into NHL points. Based on Gabe’s work 1 goal in the AHL is worth approximately 0.45 in the NHL, meaning Manchester’s 57 AHL goals for are worth 25.65 in the NHL, and their 38 goals against translate to roughly 84.44 goals against. You don’t have to be a math major to see that a 25.65/84.44 GF/GA ratio is worse than 36/70, but how much worse is it?

If we look at Pythagorean Win Expectancy, the Monarchs NHL equivalent goal differential translates into roughly an 8.4% expected win percentage. We can compare that to Buffalo’s 20.9% expected win percentage by using an odds ratio method to come up with a neutral ice expected win percentage for both the Sabres and Monarchs:

Team Neutral Ice Expected Win %
Manchester Monarchs 25.9%
Buffalo Sabres 74.1%

Even the best AHL team will only win about 1 in 4 times against an historically bad NHL team, which really displays how big the difference in talent is between the NHL and AHL. While the Sabres may be in the middle of one of the worst non-expansion campaigns in recent memory, they’re nowhere near the level that we’d want to relegate them down to the American Hockey League.

One assumption we’ve made in our analysis, however, is that the NHLe is the same at the team level for goals for and against. While I feel fairly confident that it should work out for team goals for (you’re likely to have good players who will outperform it and bad players who will underperform), on the defensive side of the puck you could make an argument that our assumption won’t hold. By setting the NHLe for defense to 0.45 we’re essentially saying that we expect a team of AHLers to let up twice as many goals in the NHL as they did in the AHL. This doesn’t seem all that intuitive, as although we’d expect them to give up more shots and get slightly worse goaltending, general team defense should be easier to transfer between leagues than offense.

We can account for this by looking into how Manchester’s expected winning percentage varies as a function of the Goals Against NHLe.

Manchester Winning Percentage vs. GA NHLe

Manchester Winning Percentage vs. GA NHLe

In the graph above we see that the break-even point for our GA Equivalence multiplier is around 0.765, which is to say that if we believe that the Monarchs would give up 1.3 (1/0.765) times as many goals in the NHL as they would playing in the AHL, they’d be even money against the Sabres. While we don’t have a great way to test this, intuitively it doesn’t seem unreasonable, particularly if you consider the effect a strong goaltender could have. To date, the Monarchs have received 0.913 goaltending in all situations, if you were to drop that down to 0.905 and assume a 20% increase in shots against their goals against increases to 48, which is 1.28 times higher than their GA now. While we can’t say conclusively that this would be the case, it also doesn’t look like that unreasonable of a comparison to me. A 50/50 game may be the upper bound for the Monarchs, and while that may not be great for Manchester, it’s certainly worse for Buffalo.

While it does seem clear that the Sabres are at least not worse than the best AHL team, it’s still not exactly cause for celebration in Buffalo. Even in last place the Sabres are a bit lucky to be where they are in the standings given their goal differential, and while help may be coming up from Erie at the end of the season, the rest of this year is surely to be a long one for Sabres fans.

Tagged with: , ,
Posted in Theoretical

Shot Location Data and Strategy II: Evaluating Individual Defensive Play

This is the second post in a (likely) 3 part series going through the data/methods/results which I presented at the Pittsburgh Hockey Analytics Workshop. Part I, which covers whether defencemen play worse on their off-hand, is available here. If you’re interested in seeing the slides or hearing the presentation (or the other presentations, which I highly recommend), they’re available on the WaR on Ice website here.

As anyone who has ever dug into the data around defensive zone play knows, evaluating an individual player’s contribution in his own end is a difficult task. Most defensive metrics that we have today show far less year-over-year repeatability than offensive metrics, suggesting that they’re more likely measuring team or system effects than individual abilities. For defencemen, who tend to drive offensive play much less than forwards, this presents a particularly tricky challenge, as if we are unable to isolate their ability to defend their net from the opposition we aren’t left with much to judge them on.

Part of this challenge is just the nature of the data that’s recorded: when a player takes a shot attempt we note who took the shot, and for goals we also note the players that assisted as well. Both of these data points, while far from perfect, give us a better ability to differentiate the players who are driving the bus in the offensive zone from the skaters that are simply along for the ride. At the other end of the rink though we don’t have the same luxury: we know who took a shot against, but no one collects information on which defender was closest to the shooter, or who was (in theory at least) responsible for defending him. In an ideal world we’d have dozens of scorers or assistant coaches writing this data down, or better yet we’d have an intricate series of cameras available to track all of this information automatically (we can dream, right?), but even if this data is or will be collected it’s unlikely to ever make it into the public sphere.

While I may be painting a bleak picture here, the situation isn’t completely hopeless. The NHL’s game files do contain information on the location that each shot was taken from, and (with a bit of effort), we can leverage this data to get a better sense of an individual defenceman’s efforts at his own end of the rink. This is important after all, because if a GM is thinking about adding a shutdown defenceman at the trade deadline to make a playoff push, we want to know whether he’s actually preventing shots himself or whether his partner is doing the heavy lifting. While we can use our current metrics figure out whether a player generally allows more or fewer shots when he’s on the ice, we can’t really get a sense as to whether it’s due to his own efforts or not at first glance. And it’s with that thought in mind that I’m going to present an initial attempt at modelling a defenceman’s individual shot prevention ability in his own zone.

The first step to figuring out defensive zone coverage though is knowing which players are playing on which side of the ice. In Part I, I introduced a new measure called “Side Bias” which measures a player’s propensity to take shots from the left or right side of the ice.

Side Bias = (# of Shots Taken from Left Side) / (# of Shots Taken from Left or Right Side) – 50%

Side bias relies on the fact that the left defenceman or winger will tend to take the majority of their shots from the left side of the ice, while the opposite will be true for right defencemen/wingers. Shot bias is the numerical formulation of this idea, players with a Shot Bias above zero tend to take more shots from the left side, while players with a Shot Bias below zero are shooting from the right side more frequently. These numbers are incredibly useful because they allows us to determine which player was playing LD/LW or RD/RW without observing each game or having an in-depth knowledge of a coach’s lineup preferences.

To apply these numbers to classify defensive pairings, we’ll use the same rules we did in Part I:

  • If a pairing has played together a significant amount (which I’ve defined as at least 10 shots taken by each player while they were on the ice together), we’ll use the side bias data from when they played together.
  • If a pairing has rarely played together (if either player has taken less than 10 shots when the pairing were on the ice together), we’ll use their overall side bias numbers to figure out which player was on which side.

This may not be a perfect system, as we’ll likely make a few mistakes with pairings who play together infrequently, but since these will form a small subset of any given players overall sample the wrong numbers won’t have a significant effect in the long run.

So how do we turn our knowledge of who’s playing on what side of the ice into a measurable result for defenceman? The simplest method is to just divide the defensive zone in half up the middle and assign each defenceman responsibility for defending their half. All the shots taken from the left side of the ice are the responsibility of the left defenceman, while all the shots taken from the right side are a negative mark against his partner. It’s a simple model, but as we’ll see later, it does provide a half decent view of individual defensive zone play.

One thing I should address before we move any further is the accuracy of the NHL’s shot location data. While many have noted issues with the league’s data, it’s unlikely to have a major impact on our results since we don’t expect there to be significant (or any, really) bias in the side of the ice that scorers place a shot on. In other words, while we know some scorers may tend to record shots from closer than they actually occurred, there’s not really any reason to think that any scorers are moving shots to the left or right side systematically. So while in small samples we may have problems with a shot being misclassified, over the long run we expect these to even out and have a net zero effect on any individual defenceman’s results.

With that out of the way, what can we learn about a defenceman when we look at the divided-ice numbers? Well the first, and simplest, calculation we can do is to figure out what percentage of the shots against are coming from his side of the ice. In my presentation I called this the % Shots Against From Side but Defensive Side Shots % or DSS% is a more concise and just as accurate name that I’ll use going forward. For a left defenceman, the calculation is just:

DSS% = # of shots from right side of ice/(# of shots from right side of ice + # of shots from left side of ice)

Note that when I say “right side” in this context it’s from the point of view of the shooter, so although a left defenceman is covering the left side of the ice, he’s responsible for the attacking right winger/right defenceman (and any other player who crosses to the attacking right side). The benefit of this metric is that it gives us an idea of which member of a pairing is defending their side better on a relative basis. For example, Marc-Edouard Vlasic hasn’t allowed more than 48.8% of the total shots against to come from his side of the ice since 2010-2011, one of the best records of any defencemen in the league.

Season Player DSS%
20102011 Marc-Edouard Vlasic 46.1%
20112012 Marc-Edouard Vlasic 48.8%
20122013 Marc-Edouard Vlasic 42.4%
20132014 Marc-Edouard Vlasic 45.7%

The weakness in DSS%, however, is that it really only compares a defenceman to his partner(s). If you put me out there alongside Vlasic, his numbers would likely look even better, as most forwards would take advantage of my inability to keep up with the pace of NHL play. In order to correct for this, we need to start taking into account the offensive side of play as well, to give us a sense as to whether a defenceman is defending his side of the ice well relative to the number of shots his team is taking.

To do this, we can calculate a player’s Defensive-Zone Adjusted Shots For % (or DZA-SF%):

DZA-SF% = Shots For / (2.04 * Defensive Side Shots Against + Shots For)

The 2.04 that we multiply the Defensive Side Shots Against by in our formula is an adjustment so that the end result is centered around 50% (it’s not 2 because we have to take into account the fact that 2.2% of shots come from the center of the ice, so 2.04 = 2/0.978).

This number actually gives us a better view as to the complete contribution of a defenceman. While we may think that a player like Kevin Shattenkirk isn’t pulling his weight if we look only at his last 3 seasons of DSS% (which were all over 52.5%), we also see that his DZA-SF% is above 53% in each of those years as well, suggesting that his weakness in his own zone is more than made up for at the offensive end of the rink.

If we look at the best DZA-SF% seasons since 2010 a few things pop out pretty quickly. First off, all these players posted a solid SF% to begin with. Second, the effect of preventing shots on your own side can have a somewhat significant impact on our evaluation of a player. Of the 10 players listed below, only Jake Muzzin and PK Subban in 2012-2013 posted a DSS% over 50%. And for players like Anton Stralman last year or Matt Greene in 2011-2012, their defence of their side is enough to boost them from very good to elite.

Season Player DSS% DZA-SF% SF%
20132014 Michal Rozsival 48.6% 61.5% 60.7%
20132014 Marc-Edouard Vlasic 45.7% 61.4% 59.0%
20122013 Lubomir Visnovsky 47.6% 61.0% 59.6%
20122013 Jake Muzzin 53.1% 60.7% 62.3%
20132014 Jake Muzzin 46.3% 60.2% 58.4%
20132014 Anton Stralman 43.9% 59.9% 56.6%
20112012 Nicklas Lidstrom 46.8% 59.0% 57.5%
20122013 P.K. Subban 50.0% 58.8% 58.7%
20112012 Matt Greene 45.4% 58.6% 56.2%
20112012 Zdeno Chara 46.3% 58.6% 56.7%

If we look at the bottom end of the chart we see that the worst names show the same themes but in the opposite direction: it’s primarily players who are either woefully bad at defending their own side (Mike Kotka and Jack Johnson) or players who struggle to generate any offense while putting up respectable DSS% (Mike Weber and Andrea Lilja). Morgan Rielly’s name is the only name on the list that really stood out to me – while traditional hockey thought says that offensive defencemen tend to be worse in their own end because they take more risks to generate that offense, that doesn’t necessarily appear to be the case when you take a quick look through the data. Since 2010, the trio of Drew Doughty, Erik Karlsson and P.K. Subban have combined to post only 3 seasons where their DSS% topped 50%. While some of Rielly’s numbers are obviously driven by the fact that he played for the Leafs last year, his 53.1% DSS% seems to be more the exception than the rule when it comes to offensive defencemen.

Season Player DSS% DZA-SF% SF%
20122013 Michael Kostka 57.3% 40.1% 43.4%
20122013 Jack Johnson 56.0% 40.2% 43.1%
20102011 Clayton Stoner 55.0% 40.3% 42.8%
20132014 Mike Weber 51.3% 40.8% 41.5%
20102011 Cam Barker 53.7% 40.8% 42.7%
20102011 Andreas Lilja 49.9% 40.9% 40.9%
20102011 Keith Aulie 52.2% 41.1% 42.3%
20112012 Marco Scandella 52.6% 41.3% 42.6%
20132014 Morgan Rielly 53.1% 41.5% 42.8%

What’s also interesting to look at is the delta between standard SF% and DZA-SF%, as this allows us to identify players who may be under or overvalued by traditional possession metrics that don’t attempt to isolate their defensive zone ability.

Season Player DZA-SF% SF% Delta
20102011 Mike Green 57.7% 52.3% 5.4%
20102011 Jared Spurgeon 51.0% 45.8% 5.1%
20122013 Marc-Edouard Vlasic 56.0% 51.7% 4.3%
20112012 Nate Prosser 48.4% 44.2% 4.2%
20132014 Bryce Salvador 53.2% 49.0% 4.2%
20122013 Justin Faulk 51.8% 48.0% 3.9%
20112012 Mike Weaver 49.9% 46.0% 3.9%
20132014 Robert Bortuzzo 52.6% 48.7% 3.8%
20122013 Sergei Gonchar 54.2% 50.4% 3.8%
20102011 Nicklas Grossmann 50.4% 46.6% 3.8%

Obviously, uber-defender Marc-Edouard Vlasic appears as one of the biggest gainers, but so does Mike Green, who has actually had a fair amount of success in the defensive zone, posting 2 seasons in which he received more than a 2.5% boost in SF% from including his defensive zone stats. But we can also see more than a few examples of players who actually go from being below to above average players when we bring their side-defending prowess into the equation. Jared Spurgeon, Bryce Salvador, Justin Faulk, Robert Bortuzzo and Nicklas Grossmann all go from being sub-50% players to above-average when we incorporate DSS% into our analysis.

Season Player DZA-SF% SF% Delta
20102011 Michael Del Zotto 47.5% 51.1% -3.6%
20112012 Derek Smith 43.5% 47.1% -3.6%
20112012 Steve Montador 50.6% 54.0% -3.4%
20122013 Michael Kostka 40.1% 43.4% -3.3%
20132014 Brooks Orpik 45.5% 48.8% -3.3%
20112012 Philip Larsen 47.4% 50.7% -3.3%
20122013 Ryan Suter 47.9% 51.1% -3.3%
20102011 John Erskine 48.6% 51.7% -3.0%
20112012 Tomas Kaberle 42.7% 45.8% -3.0%
20102011 Deryk Engelland 45.2% 48.2% -3.0%

At the other end of the spectrum, there’s a few interesting names to look at here: the first, is Brooks Orpik, who I took a few digs at during my talk in Pittsburgh. Orpik has generally been one of the worst players at defending his side, posting only one year in the past 4 in which he allowed fewer shots from his side. In fact, Orpik has gotten progressively worse each of the past 4 seasons, going from being a positive DZA-SF% and near 50% DSS% player in 2010/11 and 2011/12 all the way down to a sub 46% DZA-SF% player each of the past 2 years while allowing more than 57% of the shots from his side of the ice last year.

The other name that stands out to me here in Ryan Suter’s – believe it or not, Suter has actually been worse than Orpik at defending his own side, never once posting a sub-51% DSS% and going over 56% each of the past two years. And in spite of the fact that he’s been a positive SF% player 3 of the last 4 seasons, only one time were his adjusted numbers on the right side of 50%. The one thing Suter does seem to have going for him is that he generates a fair amount of shots on goal himself. While I won’t dive into it too much here, Suter is actually a middle of the pack player when you take the ratio of his individual shots to his defensive side shots against. Obviously, that may not be enough to justify a 7.5MM cap hit, but it may help to explain why he’s playing top pairing minutes when analytically he seems to be somewhat suspect in the defensive zone.

While all of these stats are interesting enough on their own, if we’re going to use DZA-SF% as an evaluation tool we need to know how repeatable it is year-over-year. After all, it doesn’t do us a lot of good to create a metric that shows that a given player is good or bad if we can’t use that metric to make predictions about how that player will do in subsequent years. And luckily for us, DZA-SF% shows a reasonable repeatability within our sample, with a correlation between year Y and Y + 1 of 0.43. This is actually slightly higher than the year-over-year correlation of our unmodified SF% (0.42), and not much lower than the year-to-year correlation in CF% for defencemen (0.47).

What is really interesting to me though is that decreasing our sample size by approximately 25% hasn’t significantly reduced the repeatability of our metric. Normally, if I told you I was only going to use 60 games to evaluate a player you would assume that the results we’d get wouldn’t be as accurate as if we used the whole season’s data. But what our correlation numbers show is that even after cutting out roughly half of the shots against data we have we’re not any worse at predicting a player’s performance in future years. Which really is the best validation of DZA-SF% you could ask for – as we get more and more data, we should expect the metric to become even more accurate at measuring individual performance. But even with the data we have now, we know that our predictions aren’t any worse, which is obviously a huge win for our metric.

While DZA-SF% may be a crude metric in its design, it’s important to remember that even in other sports (where analytics have made far greater strides than they have in hockey) defensive metrics are often broad estimates when compared to the relative precision that we’re able to use for offensive metrics. Within baseball, the sport for which analytics has made arguably the greatest impact, defensive stats often disagree on the value of a player, but that’s not to say that they have no value. Even if UZR and Defensive Runs Saved don’t always line-up, if an analyst knows how each are calculated, and the strengths and weaknesses of each approach, he or she can make a subjective evaluation of each stat’s “opinion” and use that to inform his or her view on a player. The idea that we can simply divide the ice in half and assign a defenceman responsibility for all shots on one half is obviously wrong in some cases, but unless we have specific individual reasons why it wouldn’t make sense for a given player, isn’t it better than a subjective evaluation? It’s not meant to be the only thing that one should look at when evaluating a player (because there really isn’t ever going to be a single number, nor a single subjective quality that we should look for to make personnel decisions), but it does give us a way to say whether a player is good or bad outside of relying solely on popular opinion.

What’s also critical to know is that even without player tracking technology or a whole team of individuals recording new data for us, our estimates using a methodology like this should get better simply with more detail in the data that the NHL is already providing us. Greg Sinclair has already pointed out that co-ordinate data is available for all events this season (see here for an example), and the simple inclusion of all shot attempts should give us even better results to work with. Even if full game tracking is still a ways off, we can still get better just by pulling more data into this methodology.

There are also several improvements that could be made to this approach with a little technical effort and thought. Obviously giving a defenceman responsibility for an entire side of the ice is a huge over-simplification, and even adding wingers who are responsible for the opposition’s pointmen into the analysis should yield better results. In addition, in his review of the PGH Analytics Workshop at Hockey Prospectus, Arik Parnass suggested a novel way to include zone entry data in our methodology to further refine our view of defensive zone responsibility. There are obviously lots of ways to make this better, it’s simply a matter of further focussing our approach to better align to how we know the game of hockey works.

DZA-SF% isn’t meant to be a be-all end-all stat, nor is it meant to present the “right way” to evaluate defencemen, rather what I’m hoping to do is illustrate a technique to make better use of the data we have available to us right now. More data will obviously make more complex analyses easier, but that doesn’t mean there’s not more we can squeeze out of the data we have right now. There’s still lots to be learned from the basic data included in the NHL’s RTSS and JSON files, it’s simply a matter of digging a bit deeper into the data and putting thought into how it connects to what we know about how hockey works. It’s these insights and analyses that will start to chip away at the argument that hockey is too fluid and too complex to measure, and provide us with more reliable methods to understand the impact of a given player.

Tagged with: , ,
Posted in Statistics

Shot Location Data and Strategy I: Off-Hand Defencemen

This is the first post in a (likely) 3 part series going through the data/methods/results which I presented at the Pittsburgh Hockey Analytics Workshop. If you’re interested in seeing the slides or hearing the presentation (or the other presentations, which I highly recommend), they’re available on the WaR on Ice website here.

One of the biggest challenges facing the hockey analytics community right now is getting beyond the player analysis stage and starting to look at how analytics can impact team and player strategy. While possession based metrics (CF%, FF-Rel) and their derivatives (dCorsi, xGD20) have vastly improved our ability to identify those players who are truly driving on-ice performance, it won’t be long until every team is more of less working with the same baseline of player identification data, eliminating any edge that analytics might have provided for early adopters.

Applying analytical techniques to on-ice strategy is one area where teams can begin to regain that advantage, and what’s more, it’s one that teams need to constantly re-evaluate as the effectiveness of strategies often changes (and sometimes drastically) over time. Perhaps the best recent example of applying data to look at team-level strategy is Eric Tulsky and company’s pioneering work on zone entries. While almost every coach from Peewee up to the NHL has probably preached the “Dump and Chase” methodology at some point, Tulsky’s work on zone entries showed that this was far from an optimal way to play the game; in fact dumping the puck in was a significant detriment to generating shot attempts for most teams. Not only was Tulsky’s work lauded by the analytical community, it has made a huge impact with many teams and players now explicitly aiming to generate more controlled entries.

The dump and chase is just one example of how a data driven approach can lead to potentially valuable new ways to play the game, or to put a team or lineup together. There are dozens of other age-old hockey wisdoms and questions that can be addressed by analytics: Is it beneficial to play 4 forwards on the powerplay? Hint: probably. When should teams pull their goaltender? Hint: earlier than you’d think. Do teams get a momentum boost after a fight? Hint: no, and it’s worse than you’d think.

One question that comes up rather frequently (at least in my mind) is whether coaches should focus on balancing their lineup so that players don’t play on their off-hand side (i.e. a left shot playing RW or a right shot playing LD). Last year, as people were debating whether PK Subban should make the Canadian Olympic team if he’d end up playing on his off-hand, I took a look at the offensive performance of off-hand defencemen. While I found that there was a slight drop-off in all-situations offensive performance, ultimately the difference was often negligible when compared to the difference in talent between players.

That study, however, only focussed on offensive play, and didn’t look at how well defencemen performed in their own end. While that made sense when looking at whether Subban, one of the greatest offensive players in the league, would struggle on his off-hand, if we want to make broader decisions about lineup construction we need to know what’s happening at both ends of the ice. If we have an up-and-coming left-handed defenceman that we want to get more minutes by moving him to his off-hand on the first pairing we need to know what kind of a drop in performance (if any) we expect to see in order to properly weigh the cost and benefits of the change.

In order to look into this, however, we first need to figure out who is playing on what side of the ice. One way we can tackle this for defencemen (and the same idea applies to forwards, although we have to factor in the fact that there are centers to deal with) is to look at who is shooting from what side of the ice when 2 players are on together. The basic idea is that the player playing left defence should be taking most of his shots from the left side, while the player playing right defence should be taking most of his shots from the right side.

To quantify this, we can use the NHL’s shot location data to find each defencemen’s “Side Bias”, which calculates the percentage of shots that a player takes from a given side of the ice:

Side Bias = (# of Shots Taken from Left Side) / (# of Shots Taken from Left or Right Side) – 50%

Side bias numbers that are greater than 0 indicate that a player took most of his shots from the left side of the ice, while side bias numbers that are less than 0 indicate that a player took most of his shots from the right side of the ice.

In order to use these numbers to figure out which defenceman was on which side of the ice for a given pairing we’ll use a simple decision system:

  • If a pairing has played together a significant amount (which I’ve defined as at least 10 shots taken by each player while they were on the ice together), we’ll use the side bias data from when they played together.
  • If a pairing has rarely played together (if either player has taken less than 10 shots when the pairing were on the ice together), we’ll use their overall side bias numbers to figure out which player was on which side.

There’s obviously flaws with this method – the 10 shots that we use as our cutoff is entirely arbitrary, and I suspect that you could probably get by using only 5 shots. In addition, for extremely rare pairs, we’re almost certain to guess wrong some of the time, although this will only have a small effect on our analysis overall.

So how do coaches tend to use their defencemen when we look at the data? Well the first thing we see is that coaches prefer, when possible, to play defencemen on their on-hand, with 64% of total shots occurring when a pairing was on their on-hand, and only 0.2% coming with both defencemen on their off-hand (this number may be overstated as well, as we may have misclassified some of the rare pairs).

Pairing (L-Side/R-Side) % of Total Shots
L/L 32.1%
L/R 64.2%
R/L 0.2%
R/R 3.5%

The second thing to note, is that L/L pairs are significantly more common (10X more common in fact) than R/R pairs. Obviously this makes sense in a league where left handed shooters are more prevalent than right handed shooters, but the size of the difference will allow us to proceed by breaking the data down into simply same-handed or opposite-handed pairs without worrying that we’re missing anything.

With that in mind, we can take a look at our 3 primary possession measures broken down by same-handed vs. opposite-handed pairs to see whether defencemen on their off-hand may be holding their teams back.

Pairing Handedness CF% FF% SF%
Opposite (L/R) 50.71% 50.63% 50.54%
Same (L/L or R/R) 49.32% 49.42% 49.52%

I should make note of a few things before I dig too deep into the numbers here: first, this data covers 2008/09-2013/14, as I chose to ignore the possibility of both defencemen playing on their off-hands so I could use the RTSS rather than the shot location data (since the data above suggests it’s extremely rare). Second, the shots for percentage numbers here are slightly different than in the slides I presented in Pittsburgh as I had to adjust my analysis to exclude situations where there were 3 defencemen on the ice.

Nevertheless, we see that in aggregate the opposite handed pairs perform better from a possession point of view than the same-handed pairs (and in fact, both analyses show a 1% bump in SF%). We also see that the advantage tends to decrease as we exclude blocks and misses, going from a 1.4% gap in CF% to a 1.2% gap in FF% and down to just a 1% gap in SF%. While we can’t say for certain, it seems likely that this is driven primarily by fewer blocks and misses in the offensive zone, as it wouldn’t make sense that defencemen would be significantly better at blocking shots or forcing misses on their off-hand (this also agrees with the research I’ve done in the past).

So if opposite handed pairings are outperforming same handed pairings, what’s driving it? Are off-hand defencemen having trouble preventing shots in their own end, or is it something else that’s holding them back possession wise? To get a better sense, we can go back to the shot location data and take a look at where the shots against are coming from.

Shot Side Opposite Same
Shots Against – Left Side % 48.2% 48.1%
Shots Against – Right Side % 49.6% 49.7%

What this table tells us is that there isn’t really a difference between the opposite or same-handed pairs when it comes to defending one side of the ice or the other. If a left-defenceman playing on the right side of the ice is hampering his team defensively, it certainly doesn’t show up when we look at where the shots against are coming from when he’s on the ice.

So if the drag on possession numbers isn’t being driven by defencemen having trouble defending their side of the ice on their off-hand, where is it coming from? One suggestion that was brought up at the conference was that defencemen on their off-hand have trouble both exiting their own zone and setting up controlled entries into the offensive zone, and I think in the context of this data it makes a lot of sense. After all, playing on your on-hand is going to significantly increase the ease at which you can make or take a pass in the neutral zone, which should translate into more opportunities for controlled entries. Similarly a pressured player on their off-hand is probably more likely to dump the puck in than attempt to make a backhand pass through a tight opening. We don’t have definitive evidence, but it is a theory that makes sense and seems to agree with the numbers we have available.

So we can conclude that teams should never play defencemen on their off-hand, right? Well, not quite. Since we’re looking at aggregate data, it’s still possible that there are other factors driving the differences that we see, and that the delta that we’re attributing to off-hand play may really be a function of some other variable. In particular, one potential issue that Arik Parnass suggested in his write-up of the conference for Hockey Prospectus was that the differences between the pairings could be related to how coaches are deploying their pairings. After all, if coaches are aware of, or at least believe there to be an advantage to keep defencemen on their on-hand, we should expect them to avoid same-handed pairings whenever possible. And it would follow then that we’d expect most of the same-handed pairings would be the 3rd pairings, who we naturally expect to post weaker possession numbers as the worst players on the team.

We can test out Arik’s theory by breaking up the pairings into 3 buckets by Even Strength TOI Rank, and seeing whether our results still hold when we control for a coaches view of talent. The first thing we should look at though is whether coaches really are avoiding same-handed pairings where possible. We can do that by taking a look at what % of the total Corsi events (both for and against) for a pairing-bucket are taken when opposite-handed pairs are on the ice vs. same-handed pairs. Looking at the % of total Corsi events should give us a pretty good proxy for time on ice, as we wouldn’t expect the overall shot attempt rates (i.e. the game pace) to vary substantially between opposite and same-handed pairings.

% of Corsi Events
Pairing Opposite Same
1 62.58% 37.42%
2 58.30% 41.70%
3 59.90% 40.10%

What we see when we dig into the data is that while coaches appear to favour opposite-handed first-pairings slightly more than 2nd or 3rd pairings, the difference isn’t significant at all, and it certainly doesn’t appear as if coaches are avoiding playing same handed pairs as their first unit.

Now let’s take a look at how same and opposite-handed pairings tend to do from a possession standpoint when we’ve broken it down by even strength time on ice.

Pairing Opposite Same Opposite Same Opposite Same
1 51.16% 49.46% 51.07% 49.55% 50.97% 49.65%
2 49.93% 49.55% 49.95% 49.64% 49.92% 49.68%
3 50.24% 48.69% 50.07% 48.84% 49.94% 48.99%

There are two things that stand out to me in this table: First and foremost, opposite handed pairs still outperform same-handed pairs in every grouping, so it appears as if playing on your off-hand for a defencemen does have a detrimental effect on puck possession, even after we’ve controlled for differences in talent level (or at least coaches views of talent level).

The second thing that’s interesting is that the difference appears to be significantly smaller for 2nd pairing defencemen. While same-handed 1st and 3rd pairs experience a drop of 1% or greater in almost every possession metric, same-handed 2nd pairs see their numbers fall by less than 0.4% in each metric. This difference isn’t easily explainable with the data we have available, but one theory I have is that it might be related to the fact that the 2nd pairing is generally not relied on to contribute heavily in the offensive side of the rink. If most coaches are using their 2nd pairing as a shutdown pair, and if the difference in possession numbers is related to the ability to generate offense as I’ve hypothesized above, then the inability to generate offense likely doesn’t matter as much to a 2nd pairing as it would to a 1st or 3rd pairing.

One thing to keep in mind (as I’ll go over in more detail in part 3) is that players playing on their off-hand do tend to post higher shooting percentages than those shooting primarily from their on-hand. So while a defenceman playing on their off-hand may be giving up some ground on the possession front, they’re getting at least part of that back by having more of their shots get past the keeper. Whether this trade-off is worthwhile obviously depends on the team and player, there are certainly circumstances where the shooting benefits outweigh the costs, but all else being equal most teams would be better off taking the possession boost rather than the shooting boost since most defencemen tend to shoot the puck relatively infrequently and at lower overall percentages than forwards.

The other factor that needs to be mentioned is that ultimately teams should look to play their best players most, regardless of structural factors like this. If a team is choosing between playing a 50.5% Corsi defenceman on his off-hand, or promoting up a 50% Corsi player so he can play on his on-hand, then going with the on-hand player is obviously the better choice. But if the choice is between a 55% off-hand player and a 50% on-hand skater the team should stick with the better player. As we’ve seen here, lineup balance is obviously important, but not so much so that you should put your best players out for less time (or your worst players out for more) just to maintain the balance.

Tagged with: , ,
Posted in Team Strategy

Numbered Days: November 2nd

Numbered Days is a weekly feature looking back at the statistical week that was in the NHL. Raw data was extracted from War On Ice, Natural Stat Trick and Hockey Reference.

Sunday, October 26: The Winnipeg Jets put only 28 of their 82 shot attempts on net as they 2-1 win over the Avs. Since 2007-2008, only 5 teams have had more than 54 of their shot attempts blocked or miss the net.

Monday, October 27: Wild lose 5-4 to the New York Rangers despite leading 3-0 with 17:07 left in the 3rd period. According to, the Wild’s win probability was up to 98.03% before Kevin Klein scored to spark the Rangers comeback. Somehow, the Wild managed to follow that up the next night with an almost as impressive comeback, rallying from 2 down (and a win probability which had dropped to 7.06%) with 16:40 remaining in the final frame versus the Bruins.

Tuesday, October 28: The Buffalo Sabres, relentless in their pursuit of Connor McDavid, put a measely 10 shots on goal in a shutout loss to the Leafs. Buffalo’s performance marked only the 20th time since 1987 that a team has been held to less than 10 shots, and only the 5th of those games in which the team was shutout.

Wednesday, October 29: Alexander Ovechkin records 13 individual Corsi attempts as the Caps fall to the Red Wings 4-2. This was the 301st time in 523 games since start of 07-08 that he’s posted more than 10 indiividual Corsi attempts. The player with the next most 10+ Corsi attempt games over that time span: Ilya Kovlachuk, who only managed to reach the 10 Corsi mark 80 times.

Thursday, October 30: New Jersey earns a 2-1 shootout win over the Winnipeg Jets as Jacob Josefson scores the only goal of the 6 shot skills competition. The win was the Devils first in a shootout since March 10, 2013, a period over which New Jersey lost 17 consecutive shootout games, bringing their total shootout wins since the start of the 2012-2013 season to 3. Oddly enough, the Devils are only tied for last in number of shootout wins over the past 3 years – the Carolina Hurricanes have also posted only 3 wins, although they’ve recorded their 3 victories in 10 tries, or less than half of the Devils 21 shootout games.

Friday, October 31: The Flames outshoot the Predators 27-25 at even-strength as they scored 3 goals in the 3rd period to win 4-3. In spite of winning the shots on goal battle, Calgary was out-Corsid 83-45 during the game, marking only the 2nd time since 2007-2008 that a team has won the shots on goal battle while posting a sub-36% CF%.

Saturday, November 1: Thomas Vanek scores a powerplay goal at 19:03 of the 2nd period to put the Minnesota Wild up 3-1 over the Dallas Stars. Vanek’s goal was the Wild’s first powerplay marker of the year, making them the last team to put one past the opposing goalie on the man advantage. Before this season, the latest a team had scored their first PPG in the BTN era was October 20th, when the 2011-2012 New York Rangers finally scored up a man in a 3-2 OT win over the Calgary Flames.

Tagged with: , , ,
Posted in Numbered Days

Numbered Days: October 26th

Numbered Days is a weekly feature looking back at the statistical week that was in the NHL. Raw data was extracted from War On Ice, Natural Stat Trick and Hockey Reference.

Sunday, October 19: The LA Kings post a sub-40% 5v5 Corsi for the 2nd straight game. This is the first time since their 2012 playoff series versus Vancouver that the Kings have been held below the 40% Corsi mark at event strength in consecutive games.

Monday, October 20: The Oilers use only 4 different players in the faceoff circle against the Lightning on route to a 3-2 win. Each of those centres finished below 50% on the night, marking only the 31st time since 2007-2008 that a team has used only 4 centres none of whom finished with a winning record on the dot (yes, this was a tough night to find something interesting to report on).

Tuesday, October 21: The Phoenix, err, Arizona Coyotes record 24 of their 59 shot attempts over two spells of just about 3 minutes each (from 16:34 to 18:59 of the 2nd and 4:40-7:42 of the 3rd). That’s just 40% of their shot attempts coming in only 8% of the game. Amazingly, Arizona manages to finish at exactly 50% in all-strengths Corsi despite recording only 35 shot attempts over 59:33 seconds of play.

Wednesday, October 22: Claude Giroux leads the Flyers in total time-on-ice as the Flyers defeat their in-state rival Penguins. Since 2007-2008, Giroux has been the most played skater on the Flyers 43 times. Only 3 forwards have been the most-used skater more often over that time period, with Ilya Kovalchuk accomplishing it an amazing 139(!!!) times in 505 games.

Thursday, October 23: Darcy Kuemper records his 3rd shutout of the season in the Wild’s 2-0 win over the Coyotes. Over the past 2 seasons Kuemper has recorded 5 shutouts in 30 games played, no goalie has recorded a better shutout/GP rate over that period.

Friday, October 24: Ondrej Pavelec is pulled after allowing 4 goals to the Tampa Bay Lightning in 40 minutes. Since 2009-2010, only 1 goalie has more games with between 20 and 50 minutes played (i.e. he got pulled). That goalie: Steve Mason of the Philadelphia Flyers.

Saturday, October 25: Jason Zucker scores on both of his shots on goal while playing only 10 minutes in Minnesota’s 7-2 thumping of the Lightning. In the Behind The Net era, only 17 players have scored at least 2 goals and posted a 100% shooting percentage while playing 10 minutes or less. His Wild teammate Zach Parise is the only player to score a hat-trick on only 3 shots in that time span, needing only 9:36 to put 3 goals past Michael Neuvirth in March of 2012.
Tagged with: , , ,
Posted in Numbered Days

Get every new post delivered to your Inbox.