Improving Corsi Rel: Adjusting for Team Talent

Corsi Rel is a stat that, in theory at least, is meant to address the fact that a good player on a poor team is still likely to post a bad CF%. We don’t want to punish superstars who are surrounded by replacement level players in the same way that we don’t want to reward hangers on playing on Cup winners (*cough* Dave Bolland *cough*). For defencemen in particular, Corsi Rel is often a better way to measure their impact, given that they have much less control over play in general and are driven heavily (at least in terms of raw results) by the talent up front that they’re paired with.

The problem with Corsi Rel, however, is that it’s too blunt of an instrument – it assumes that each player can only affect his team’s results by a set amount, regardless of the talent of that team. A good player on a bad team is assumed to be a good player on any team he plays on, which we know is unlikely to be true in practice. A player with a +1% Corsi Rel on a 42% team is unlikely to make a 56% team into a 57% squad, but pure Corsi Rel assumes that this would be the case. So while we know that there’s value in the information that Corsi Rel contains, the question is how to maximize that value.

Read more ›

Tagged with: ,
Posted in Theoretical

How much do zone starts matter part II: A lot on their own, not that much in aggregate

In Part I of our review of zone starts, we looked at the how the traditional definition of zone starts varied from what most people would consider a “true” zone start, and found that when we applied the true zone start definition to our data, the spread between players in zone start percentages decreased significantly. One key reason for the difference between methods is the inclusion of on-the-fly starts, which tend to make up around 60% of a players total shifts, and which drastically decrease the impact of each defensive/offensive/neutral faceoff. Another driver is the fact that often a player’s zone start percentage is impacted by their own performance: bad players end up with more defensive zone faceoffs due to their inability to drive possession, which incorrectly inflates their defensive zone start percentages. This also helps to create a false link between zone start percentages and possession numbers, leading people to incorrectly infer that tough zone starts are a key driver behind a player’s results.

While it’s useful to know that the true difference in zone starts between players is generally minimal, that doesn’t necessarily mean that we can just ignore them completely. To make a judgement about the overall impact that zone starts have we first need to figure out what the impact of a single zone start is on possession. To do that, we can simply look at all the 5v5 shifts taken since 2008 in aggregate, and calculate the overall Corsi For Percentage broken down by starting location.

Read more ›

Tagged with:
Posted in Zone Starts

How much do zone starts matter part I: (Maybe) not as much as we thought

In 2013-2014 Boyd Gordon and Manny Malholtra were two of the worst players in raw CF% across the league at 42.3% and 41.6% respectively. Most people would argue that their results were not all that surprising given that they faced the toughest zone starts of any players in the league, with over 59% of their shifts starting in the defensive zone according to stats.hockeyanalysis.com, almost 10% higher than anyone else in the NHL. The problem with this argument, however, is that neither player actually started 59% of their shifts in their own end. While both players did see 59% of the faceoffs they were on the ice for come at their own end of the rink, if we look at where each shift actually started and ignore faceoffs that started mid-shift, we see a much different story. While both players still faced some of the toughest zone starts of any player in the league, the actual percentage of Boyd Gordon’s shifts that started in front of his own goaltender was only about 32%, almost half of what’s traditionally reported. Malholtra, on the other hand, has a much larger gap: only 25% of his shifts actually started in the defensive zone, nearly 35% lower than his faceoff-based metric.

It’s not just Malholtra and Gordon and those at the extreme ends of the spectrum who are grossly misrepresented by traditional zone start percentage either. Every player across the NHL has their usage numbers skewed by the fact that most sites use faceoffs to measure zone starts rather than looking at the actual shift data (I should point out that most of the main stats sites do make it very clear that they use faceoffs, and that Hockey Analysis actually refers to the metrics as OZFO%/DZFO%/NZFO% now). Part of the reason for the differences is that the traditional measurements don’t take into account shifts that start on-the-fly as opposed to at a stoppage in play. And while this explains some of the difference we see, it’s not the bulk of the problem. The main issue with the current approach to measuring zone starts is that the measurement is often skewed (and sometimes heavily) by the performance and talent of the player in question. Bad players tend to end up with more defensive zone faceoffs because their opponents tend to get more shot attempts against them, which leads to more opportunities for their goalie to freeze the puck and more defensive zone faceoffs. The same idea is true in reverse for good players, and it all adds up to a false correlation between the traditional zone start measure and possession numbers.

Read more ›

Tagged with:
Posted in Zone Starts

Hockey Prospectus: Whose special teams are really special this year?

Today on Hockey Prospectus I’ve got an article up looking at how each team’s special teams units have performed against expectations this year. Take a look at it here.

Tagged with:
Posted in Hockey Prospectus

Hockey Prospectus: Randy Carlyle’s Effect on Puck Possession and Scoring Chances

I’ve got an article up over at Hockey Prospectus on Randy Carlyle and his effect on puck possession and scoring chances. Hint: it wasn’t pretty.

If you’re interested, you can read it here.

Tagged with: ,
Posted in Hockey Prospectus

2015 World Junior Prediction Update – December 27th

Updated Predictions

Day 1 of the World Juniors is in the books and while there weren’t any upsets, two medal favourites nearly stumbled out of the gate with both Russia and the US needing the shoot-out to get past Denmark and Finland respectively.

Updated tournament predictions are given in the table below – as a reminder, these predictions take into account both our initial predictions generated using NHL equivalencies, as well as an Elo-based adjustment to take into account the results that we’ve seen to date.

Team P(1) P(2) P(3) P(4) P(5) P(6) P(7) P(8) P(9) P(10)
CAN 55.3% 26.1% 9.5% 1.4% 6.3% 0.8% 0.5% 0.0% 0.1% 0.0%
USA 35.6% 37.6% 12.7% 2.5% 8.8% 1.4% 1.1% 0.1% 0.2% 0.0%
RUS 4.5% 12.7% 26.6% 15.4% 19.2% 8.6% 8.4% 2.5% 1.7% 0.4%
SWE 2.7% 9.8% 20.3% 18.5% 18.5% 11.4% 11.3% 4.5% 2.3% 0.7%
CZE 0.1% 1.2% 1.8% 4.4% 5.9% 12.2% 11.3% 14.7% 26.8% 21.5%
SVK 0.0% 0.7% 1.8% 6.9% 2.4% 7.2% 6.9% 14.5% 21.9% 37.7%
DEN 0.2% 1.9% 3.1% 8.1% 8.4% 16.3% 15.3% 19.2% 15.6% 11.8%
SUI 0.4% 3.0% 5.4% 11.3% 11.1% 16.5% 17.1% 15.9% 11.9% 7.4%
GER 0.2% 1.7% 4.5% 13.2% 5.4% 12.6% 12.0% 19.1% 14.2% 17.3%
FIN 0.9% 5.4% 14.2% 18.2% 14.0% 13.0% 16.0% 9.5% 5.4% 3.3%


After defeating Slovakia handily to open the tournament, Canada remains the prohibitive favourite, with their odds of winning gold inching upwards to 55.3%. Most of the uptick in the Canucks’ odds comes at the expense of the Americans, whose shootout win decreased their standing in the model slightly. Sweden is the big winner of the day, however, with their medal odds jumping up to roughly 1 in 3, an increase of about 6% of their initial chances.

Today’s Games

Not many big games on the schedule today, with 3 teams odds of winning sitting above 70% according to our model. The lone “close” match looks like it could be the deciding factor in who avoids the regulation game in Group B, with the Swiss sitting as slight favourite over the Czechs. The other interesting match-up is the first of the day, where the Danes, who may be underrated by our model due to a lack of NHLe data for the Danish leagues, take on the Swedes, who had no trouble handling the Czechs in their opener. Canada takes on Germany, and will win.

Visitor Home Visitor Win % Home Win %
Sweden Denmark 73.0% 27.0%
Finland Slovakia 72.9% 27.1%
Switzerland Czech Republic 56.5% 43.5%
Canada Germany 95.8% 4.2%

 

Tagged with: , , ,
Posted in Predictions, Uncategorized, World Juniors

2015 World Junior Predictions

It’s finally here, the most important day of the year in Canada, World Junior Hockey Day! While the tournament may not have the same appeal in other areas of the World (or any appeal, in some cases), for Canadians the World Junior tournament is serious business, which makes the fact that Canada hasn’t won since 2009 a bit of a sore spot. While Canadians always enter the tournament with an often warranted air of confidence, predicting what will happen in the Juniors is tough given the lack of information we have to work with about the teams and players. While most analysts will be aware of the top level talent on each squad, it’s difficult to know how a 3rd liner toiling away in the German second division compares to a player in the Swiss Jr. A league. We can say with a reasonable degree of confidence that the Canadians will be better than the Swiss, but how much better can be a tough question to answer. To take a more rigorous approach, we have to dig into the data and adjust for varying league strengths, because that German second division player may be just as good as the guy manning the point for Canada’s first powerplay unit. With that in mind, let’s get started with the first ever set of Puck++ World Junior Predictions.

Read more ›

Tagged with: , ,
Posted in Predictions, World Juniors

How much skill exists in on-ice shooting percentages?

Earlier today, Phil Birnbaum posted a piece offering further arguments in favour of his view that shot quality exists and is a legitimate strategy choice for teams.  For those of you unfamiliar with Phil’s work, he’s long been a proponent of the idea that teams can sustain high shooting percentages, and while I’m not necessarily in agreement with all of his theories, the points he makes are generally well thought out and argued (all of which is to say that regardless of where you stand on the topic you should read what he’s written to date).

In his article Phil made two main arguments: first, that players have the ability to increase their teammates shooting percentage while they’re on the ice; and second, that because of this we can conclude that team shooting percentage isn’t random. I’m not going to dig too much into the second argument, as that’s not something that’s ever really been argued by any analyst (most claim that while shooting is a skill, at the team level the differences are frequently so small that they’re negligible), and I think that the salary cap structure and the fact that teams tend to pay for past Sh% may explain why the differences we observe are small.

I do, however, want to look into Phil’s first claim, that players able to influence their on-ice shooting percentage. The main evidence that Phil offers in support of this argument is that for a set of elite players their ability to maintain a positive On-Ice Sh% WOWY persists year-over-year. While this finding certainly supports his point, it’s far from conclusive: after all, his analysis focuses on 10 players and we can’t really reach broad conclusions based on 1% of the leagues population. Furthermore, several of the players he highlights are excellent shooters themselves, and what we’re seeing in their on-ice shooting percentage may be more of a reflection of their individual shooting percentage than any goal creation ability. Lastly, all of these players are amongst the best in the league and tend to play with the other very good players: any year-over-year trends we see may be more reflective of consistent top-level teammates rather than ability.

With all that being said, I do believe that there is skill in on-ice shooting percentage, it’s simply a matter of how much talent there is compared to luck, and how persistent it is in season-by-season numbers. To investigate this, I pulled all the individual seasonal data from stats.hockeyanalysis.com, and calculated each player’s Individual Corsi Sh% (iCSh% = iGF/iCF) and their On-Ice Teammate Corsi Shooting Percentage (TMCSh% = (GF – iGF)/(CF – iCF)). After filtering out anyone who wasn’t on the ice for at least 200 CF in a season, I split the data between forwards and defencemen and calculated the correlation in both iCSh% and TMCSh% over several seasonal pairs (i.e. correlation between year y and year y + x for x in [1,6]). By looking at the correlation over multiple years we should be able to figure out the extent to which similar teammates/deployment factors into each of our metrics repeatability.

Year to Year Correlations - Forward Shooting Percentages

Year to Year Correlations – Forward Shooting Percentages

Starting with the forwards, we see that our results do seem to align with Phil’s theory – the correlations for iCSh% never drop below 0.2, and average roughly 0.22 if we exclude the last datapoint which seems to be way out of line. Similarly, forwards seem to have at least some ability to influence their teammates shooting percentage, with the correlations averaging 0.18 for up to 5 year differences. And while none of the correlations that we see are phenomenal (as a point of comparison CF% generally correlates at levels between 0.33 and 0.57 over 5 years for forwards), there’s very clearly some talent there. What we see also makes a lot of sense intuitively – forwards have more control over the shots coming off of their own sticks, but we expect that they should have at least some ability to create scoring opportunities for their teammates.

Year to Year Correlations - Defencemen Shooting

Year to Year Correlations – Defencemen Shooting

For defencemen, on the other hand, we see a much different story. Defencemen show little repeatability in individual shooting ability, with the correlation generally sitting in the 0.08-0.10 range (again, with the exception of our odd 6 year delta seasons). When it comes to improving their teammates shooting, however, the picture is even worse, with the correlation dropping to roughly 0.06 for a 1 year delta, and down to basically 0 at lengths more than 4 years.

While it’s not exactly news that forwards drive offensive play more than defencemen, this really underscores how little control defencemen have once the puck leaves their sticks. This helps explain (in part at least) why it’s so difficult for a defencemen to look “dominant” for multiple seasons at a time, since so much luck exists in their year-to-year shooting percentages it’s really tough to get the bounces to go your way for multiple years at a time. And while forwards do show more control of whether the puck goes in when they’re on the ice, a single season’s data is still about 75% luck for their own shots and 80% luck for their teammates. All of which is to say that while generating and converting on scoring opportunities is definitely a skill for forwards, it can take quite some time before we know if what we’re seeing is signal more than noise. We needn’t ignore shooting percentages, but rather we need to keep in mind that bounces can take a long while to even out, so if you’re seeing names like Chris Neil near the top of the league in shooting percentage (19th at the moment!!!) it may be a sign that it’s too early to be checking the data.

Tagged with: ,
Posted in Shooting Percentage

Score Adjusted Weighted Shots

Last week, Tom Tango, Sabermetrician extraordinaire for the Chicago Cubs (and one of the original hockey analysts to, you know, actually make money doing this thing), posted an article on his site proposing a new metric to better weight the components of Corsi. Tango defined his new metric, (which he proposed with tongue planted firmly in cheek be called Tango, or failing that Weighted Shots), as a simple linear combination of goals and non-goal Corsi events:

wSH = Goals + 0.2*(Shots + Missed Shots + Blocked Shots)

The weighting of 0.2 was informed (although not strictly derived) from a regression between half-season Corsi components and half-season Goals For (i.e. calculating the weights to maximize the predictive value of future Goal Differential). Tango’s goal was to preserve the predictive information that we see in Corsi while properly taking into account the fact that goals are really what the game is all about. And while no analyst has ever argued that Corsi alone is enough to evaluate a team or player with, Tango’s point was that we had the data to make intuitive improvements to Corsi in a relatively easy manner.

One of the problems with how wSH is formulated though, is that it aimed behind the current state of Hockey Analytics. As Micah McCurdy has illustrated, Score Adjusted metrics vastly outperform standard possession metrics, since both the location of the game and the current score state have significant impacts on how teams perform. Unless we take score and venue effects into account, even an improved metric like wSH is missing important information. Fortunately, if we follow along with Micah’s original methodology, we can figure out the appropriate adjustments to bring these factors into wSH.

Using data from 2008-2014, we can first calculate the probability of a given team recording an event based on the event type, score state and game location (home/away):

Score Home Goal Away Goal Home Shot/Miss/Block Away Shot/Miss/Block
-3 52.87% 50.83% 59.25% 56.66%
-2 51.19% 50.49% 57.34% 55.31%
-1 53.20% 49.77% 54.96% 53.22%
0 52.91% 47.09% 50.95% 49.05%
1 50.23% 46.80% 46.78% 45.04%
2 49.51% 48.81% 44.69% 42.66%
3 49.17% 47.13% 43.34% 40.75%

Then, we can take the probabilities, along with Tango’s wSH weights (1 for goals, 0.2 for shots/misses/blocks) and combine them to calculate weighted adjustment factors for a Score Adjusted Weighted Shots metric (SAwSH):

Score Home Goal Weight Away Goal Weight Home Shot/Miss/Block Weight Away Shot/Miss/Block Weight
-3 0.943 0.983 0.163 0.173
-2 0.976 0.990 0.171 0.179
-1 0.936 1.005 0.180 0.187
0 0.942 1.058 0.196 0.204
1 0.995 1.064 0.213 0.220
2 1.010 1.024 0.221 0.229
3 1.017 1.057 0.227 0.237

As you can see, the value of a goal relative to a shot isn’t constant in our new method. It ranges from one goal being worth 5.78 shots/misses/blocks (for a home team down 3 goals) all the way down to 4.46 shots/misses/blocks per goal (for a visiting team up by 3).

Now that we’ve defined how to calculate SAwSH, let’s look at how well it performs compared to Score Adjusted Corsi. Whenever we evaluate a new stat there are two things we need to look at to decide how much trust to put in it: 1) the repeatability of the metric, that is how well our measurement over one period predicts the same measurement over another period; and 2) how well the metric predicts our result of interest (winning hockey games). A metric that’s not repeatable doesn’t do us much good when we’re evaluating a team or player, since we don’t know whether the results we observe are due to luck or talent. At the same time, a measure that’s highly repeatable but doesn’t relate to winning is a metric that we should just ignore.

The best way for us to test for repeatability at the team level is to look at the correlation between our results in odd-numbered games and in even-numbered games. Since there’s nothing in the game number that would relate to our results, if we see a high correlation it’s a good sign that what we’re observing is a talent.

Metric Correlation
Score Adjusted Corsi 0.873
Score Adjusted Weighted Shots 0.841

While Score Adjusted Corsi shows slightly more repeatability, the difference at this level is more or less negligible at this point. Both metrics show enough repeatability that we don’t have to worry that they’re influenced too heavily by luck. This is particularly important for SAwSH as it dispels one of the biggest worries that many people had about it, being that the inherent variableness in shooting and save percentage would mean that we’d need a much larger sample before we could trust the results.

If we move on to predictability we can run a similar test, but instead of correlating the same metric between even and odd-numbered games, we’ll look at how well our Score Adjusted numbers in even games predict a team’s Goals For Percentage in odd games (and vice-versa).

Metric Correlation (Even -> Odd GF%) Correlation (Odd -> Even GF%)
Score Adjusted Corsi 0.475 0.421
Score Adjusted Weighted Shots 0.495 0.446

In both datasets, SAwSH does a better job of predicting out of sample Goals For %. This makes sense of course, since SAwSH includes goal scoring/goaltending data where SAC doesn’t. The difference between SAC and SAwSH is also interesting to note: we seem to be able to explain ~5% more of the variance in out of sample GF% by using wSH rather than Corsi, illustrating the fact that shooting percentage and save percentage do matter at the team level. While they’re obviously not as important as possession (after all, we still do fairly well using only SAC), there’s clearly a benefit to including them our analyses.

While the computational cost of SAwSH may be slightly higher than standard CF, the benefits are more than just an increase in predictive power: wSH makes much more sense intuitively, and is a direct counterpoint to the argument that analytics are too focussed on possession. SAwSH makes a much better argument for analytics while giving up very little in the way of repeatability. While there are obviously further areas to investigate (the weightings in Tango’s original regression equations are worth a deeper look as there’s likely further value to be extracted there), SAwSH is clearly a step-forward for the analytics movement. And although some may argue that the power of SAwSH is a repudiation of Corsi as a metric, I instead look at it as a validation of possession-based analyses: the value of the sample that Corsi offers is obvious; SAwSH is just a small tweak to better reflect the inherent shooting and goaltending differences that Corsi can miss in some cases.

Addendum

On his site Tango asked for correlations for Score Adjusted Goals, and so I’m happy to oblige:

Comparison Correlation
Odd(SAG%)->Even(SAG%) 0.365
Odd(SAG%)->Even(GF%) 0.366
Even(SAG%)->Odd(GF%) 0.354

Obviously, SAwSH is quite a bit better in both repeatability and predictability, but what’s more interesting is how little additional value we get from adjusting GF% for Score State/Venue. The correlation between raw GF% in odd and even games is 0.353, which means that we’re getting almost no additional information from our adjustments.

Tagged with: , ,
Posted in Statistics

Are the Buffalo Sabres worse than an AHL team?

The Buffalo Sabres are not a good hockey team. This is not news to anyone. At 6-13-2 the Sabres sit last in the Atlantic Division by 6 points, and are tied with the Carolina Hurricanes and Edmonton Oilers for least points to date across the NHL. What’s worse for Buffalo is that they’re almost certainly much worse than their record suggests. Their Pythagorean Win Percentage, which calculates a team’s expected winning percentage based on their Goals For and Goals Against (and is a better predictor of future success than regular winning percentage) sits at 20.9%, 8% lower than their actual winning percentage.

It’s not easy to describe how bad Buffalo’s 20.9% Pythagorean winning percentage is: the only teams since 1992 to achieve anywhere close to that level of futility were the 1992-93 and 1993-94 Ottawa Sentors, and they at least had the excuse of playing in the first 2 years of their franchise history. One question that’s come up a few times across the sports analytics world recently is whether or not a minor league/college team could defeat the worst professional team in a given sport. Over at FiveThirtyEight, Neil Paine concluded that even the 0-14 Philadelphia 76ers would still be about a 78% favourite over the Kentucky Wildcats. In addition, Tom Tango ran through the math for MLB on his blog, and found that a top tier minor league team could score up to 70% as many runs while allowing 143% more and win up as even-money against the worst MLB team. The natural question that follows, of course, is are the Sabres really bad enough to lose to an AHL team?

To answer this, we’ll first need to figure out how well we’d expect the best AHL team to do (from a goal differential point of view) if we moved them up to the NHL. As of Tuesday night, the top team in the AHL was the Manchester Monarchs, who have posted 57 goals for and 38 goals against while en route to a 12-4-1 start. While the Monarchs have been slightly lucky to date (their Pythagorean Win Percentage is about 70% at that Goal Differential), their obviously still a good team. But because they play in the AHL we can’t just use the Goal Differential that they’ve posted there, we have to adjust it to reflect how we feel they’d perform if we airdropped them onto an NHL rink.

Fortunately for us, someone else has done the legwork to come up with a translation factor already! NHLe (NHL Equivalency) is a stat first created by Gabe Desjardins, and its purpose is to allow us to convert the number of points a player scored in a non-NHL league into NHL points. Based on Gabe’s work 1 goal in the AHL is worth approximately 0.45 in the NHL, meaning Manchester’s 57 AHL goals for are worth 25.65 in the NHL, and their 38 goals against translate to roughly 84.44 goals against. You don’t have to be a math major to see that a 25.65/84.44 GF/GA ratio is worse than 36/70, but how much worse is it?

If we look at Pythagorean Win Expectancy, the Monarchs NHL equivalent goal differential translates into roughly an 8.4% expected win percentage. We can compare that to Buffalo’s 20.9% expected win percentage by using an odds ratio method to come up with a neutral ice expected win percentage for both the Sabres and Monarchs:

Team Neutral Ice Expected Win %
Manchester Monarchs 25.9%
Buffalo Sabres 74.1%

Even the best AHL team will only win about 1 in 4 times against an historically bad NHL team, which really displays how big the difference in talent is between the NHL and AHL. While the Sabres may be in the middle of one of the worst non-expansion campaigns in recent memory, they’re nowhere near the level that we’d want to relegate them down to the American Hockey League.

One assumption we’ve made in our analysis, however, is that the NHLe is the same at the team level for goals for and against. While I feel fairly confident that it should work out for team goals for (you’re likely to have good players who will outperform it and bad players who will underperform), on the defensive side of the puck you could make an argument that our assumption won’t hold. By setting the NHLe for defense to 0.45 we’re essentially saying that we expect a team of AHLers to let up twice as many goals in the NHL as they did in the AHL. This doesn’t seem all that intuitive, as although we’d expect them to give up more shots and get slightly worse goaltending, general team defense should be easier to transfer between leagues than offense.

We can account for this by looking into how Manchester’s expected winning percentage varies as a function of the Goals Against NHLe.

Manchester Winning Percentage vs. GA NHLe

Manchester Winning Percentage vs. GA NHLe

In the graph above we see that the break-even point for our GA Equivalence multiplier is around 0.765, which is to say that if we believe that the Monarchs would give up 1.3 (1/0.765) times as many goals in the NHL as they would playing in the AHL, they’d be even money against the Sabres. While we don’t have a great way to test this, intuitively it doesn’t seem unreasonable, particularly if you consider the effect a strong goaltender could have. To date, the Monarchs have received 0.913 goaltending in all situations, if you were to drop that down to 0.905 and assume a 20% increase in shots against their goals against increases to 48, which is 1.28 times higher than their GA now. While we can’t say conclusively that this would be the case, it also doesn’t look like that unreasonable of a comparison to me. A 50/50 game may be the upper bound for the Monarchs, and while that may not be great for Manchester, it’s certainly worse for Buffalo.

While it does seem clear that the Sabres are at least not worse than the best AHL team, it’s still not exactly cause for celebration in Buffalo. Even in last place the Sabres are a bit lucky to be where they are in the standings given their goal differential, and while help may be coming up from Erie at the end of the season, the rest of this year is surely to be a long one for Sabres fans.

Tagged with: , ,
Posted in Theoretical