Introducing xGF20: A Context Neutral, Corsi-Based Goal Creation Metric

If you follow the hockey analytics world at all (and since you’re reading this, I’m assuming you do), you likely have a pretty good understanding of, and appreciation for, Corsi. Analysts like to use Corsi because it’s a better predictor of future success than simple goal-based metrics because of the much larger sample that you get to work with. While a team might ride the percentages to score and win more in the short-run, if you really want a good sense of where a team is going to end up you’re better of looking at their Corsi % (more specifically, at their Even Strength Corsi % in close games).

This, however, is problematic for many traditional hockey-types. A common criticism of analytics from the “haters”, is that Corsi doesn’t take into account shooting percentage or the play of teammates and that it isn’t a “be-all-end-all” statistic (not that anyone ever said it was). To a degree, these criticisms are fair: we know that teammates have a large effect on play, that’s why we look at stats like Corsi Rel. In addition, there’s at least some convincing evidence that higher or lower on-ice shooting percentages are sustainable over the longer term. Critics of analytics will argue to no end that goal creation, not shot creation is the key to success. Their view is that the “Corsi Hockey League” and the NHL where goals are scored are different worlds which can’t be bridged by silly things like numbers.

Luckily for those of us more in touch with reality, numbers are actually really good at building bridges (admittedly, this metaphor is going a bit far). Creating a metric that measures goal-creation ability isn’t straightforward, but it definitely can be done. So hold on to your hats as I explain in as long-winded a fashion as is humanly possible, xGF20.

At the most basic level, there are two ways a player can increase the scoring that occurs for his team while he’s on the ice: first, a player can either increase the number of shots that are taken when he’s on the ice (increasing his CF20 while hopefully maintaining a reasonable shooting percentage); or second, a player can cause more of those shots that are taken to go into the net (increasing his on-ice shooting percentage). But this is a bit of an oversimplification of the situation: it doesn’t take into account who’s doing the shooting while the player is on the ice, Sidney Crosby and Alex Ovechkin are both positive Corsi players with above average on-ice shooting percentages, but each of them accomplishes it in different ways.

Fortunately, we can break the data down further to give us insight into these differences between players. For each of the metrics we listed above, we can look at both the individual component of it, and the teammates’ component of it. Each player’s CF20 is made up of 2 distinct parts, the number of shots he takes himself and the number of shots his teammate takes (this can also be viewed as the number of shots he creates). Similarly, a player’s on-ice shooting percentage can be separated out into his individual shooting percentage, and his teammates shooting percentage. We expect that snipers will naturally score on more of their shots, while we’d predict the linemates of a natural playmaker to score on a higher percentage of theirs.

With these ideas defined, let’s look at the metrics we can use to measure them at the individual level. We’ll start by looking at the Corsi front, and how we can split a players shot-creation ability into the portion that he contributes himself, and the portion his playmaking ability creates (note: I’m going to use the terms shot and Corsi interchangeably in this article, just know that all references to shots or shooting percentage refer to all shot attempts, not just shots on goal). Brian Macdonald recently had agreatseries of articles up outlining his efforts to create a metric that quantified a player’s playmaking ability beyond just his assists. His method breaks up a player’s WOWY (the difference between his team’s performance when he’s on the ice and off of it) into an altruistic and individual contribution to separate out playmaking skill (such as that possessed by Sidney Crosby) from individual shot creation skill (see Ovechkin, Alex).

I’m going to copy his method here, using a few small tweaks that I feel allow us to get a better breakdown of the individual/altruistic components. In his article, he defines a player’s altruistic contribution as being the difference between his WOWY and his individual contribution (defined simply as the Corsi events per 60 that he individually takes, or iCorsi20 for our purposes). I’m going to stray from that approach slightly and look at the player’s individual contribution as the number of shot attempts above what would be expected for an average forward or defenceman playing on the same line. So for forwards, we define our individual contribution as:

Individual Contribution = iCorsi20 – CF20*0.7*1/3

We get the 0.7 comes from the fact that since 2007-2008 forwards have taken 70% of all the shot attempts, and the 3 is obviously the number of forwards on the ice in a 5v5 situation. Similarly, for defencemen we have:

Individual Contribution = iCorsi20 – CF20*0.3*1/2

With a similar explanation as above. I prefer this definition, as it allows us to have negative individual contribution numbers, and gives a greater range of altruistic scores. The best way to show this might be with an example: imagine a player with a WOWY of 2 who takes 0 shots over the course of a whole season. Under Brian’s method, his altruistic contribution would be 2 (Altr. = WOWY – Indiv), but this doesn’t really capture the situation well: a WOWY essentially is supposed to show us how much better a team does when we replace a random average player with a given player. But an average player wouldn’t take zero shots, so we have to remove those average shots when we determine how much a player is contributing individually to get a proper view of his personal shot creation.

Anyways, back to the point at hand: with our Individual Contribution defined, we can then calculate a player’s altruistic contribution, in the same way that Brian does.

Altruistic Contribution = WOWY – Individual Contribution

Or

Altruistic Contribution = (CF20 – TMCF20) – (iCorsi20 – CF20*Positional Multiplier)

With the positional multipliers given above.

Now that we have the shot creation side of things out of the way, let’s focus our attention on shot quality/shooting percentages. Fortunately, our metrics here require a lot less explanation. Our individual component here is just going to be our standard Individual Corsi Shooting Percentage (iCSh%). On the other hand, we’ll define our teammate Corsi shooting percentage metric as being the number of goals a player’s teammates score while he’s on the ice divided by the number of Corsi attempts taken  by his teammate’s while he’s on the ice:

Teammate CSh% = TmCSh% = (GF20 – iG20)/(CF20 – iCorsi20)

Simple, right? With these metrics defined we can start looking at how we can put it all together to get a view of how a player creates goals for his team. The one thing we need to know before we get started though, is how repeatable each stat is. This is kind of a key point: if we’re looking to isolate a player’s goal scoring “talent” then we had better be sure the components of our metric are indicative of talent themselves rather than luck.

In order to do this I looked at the 3-year correlations (2007-2010 and 2010-2013) for each of the statistics we outlined above. I restricted my sample to players who played at least 500 even-strength minutes in each period, and looked at the overall correlations, as well as the correlations for forwards and defenseman individually (all data taken from Hockey Analysis).

Stat Forwards Defensemen All
Altruistic Contribution 0.70 0.38 0.64
Individual Contribution 0.81 0.74 0.79
iCSh% 0.49 0.21 0.77
TmCSh% 0.29 -0.03 0.32

What we see shouldn’t be surprising for anyone with some background knowledge of hockey analytics. The Corsi-based metrics (Altruistic and Individual Contributions) show a much higher correlation, and likely represent a measurable skill. On the other hand, the shot-based metrics are definitely a mixed-bag. There certainly appears to be some skill in how often a player is able to score on his own shots, but when we start looking at the frequency with which his teammates score we start to see much more randomness.

While the repeatability of the Corsi-based metrics is reasonably high, we would ideally like to see more in the shooting percentage stats if we’re going to use them in our new metric. Fortunately, we can correct for the lower correlation values by regressing a players shooting stats to the mean. For forwards, we’re going to add on 430 “league average” Corsi attempts (i.e. Corsi attempts with a 5.36% chance of going in), while for defensemen we’ll add 1300 Corsi attempts at 1.83 CSh% (see this article Tango Tiger for the method used). Because the TmCSh% correlations are so low, we’ll actually just use the league average TmCSh%. While 430 Corsi attempts isn’t a lot to add on, the 1300 Corsi attempts is actually a huge number, as only 5 defensemen took that many attempts between 2007 and 2013. It just goes to reinforce that while defensemen may run hot from time-to-time, the expected regression can be pretty severe (which Mike Green can attest to).

So, we defined our metrics and figured out which are repeatable, and now we’ve decided which to regress and how much to regress them-let’s start pulling it all together. As I mentioned at the start of the article, what we’re working towards is creating a context-neutral GF20 metric, which should give a good idea of the goal-creation ability of any given player. To figure out how to bring all of our metrics together, we simply have to look at how goals are defined:

Goals For = Individual Shots * Individual Shooting % + Teammate Shots * Teammate Shooting %

We spent a lot of time above defining metrics for each of these items, but how we bring them together isn’t necessarily straightforward, and will vary for both forwards and defensemen. Let’s start by looking at forwards-our formula for expected goals for per 20 minutes (or xGF20) is:

xGF20(Forward) = (Individual Contribution + 17.5*0.7/3)*(Regressed Historical iCSh%) +

                                (17.5*0.7*2/3 + Altruistic Contribution * (0.7*2/3 + 0.7*1/3*0.7) * 5.36% +

                                (17.5*0.3 + Altruistic Contribution * (0.3 + 0.7*1/3*0.3) * 1.8%

That’s a lot of variables and numbers so let’s go through it line-by-line. The first line is simply the individual expected goals scored. It takes the Individual Contribution and adds in the league average expected iCorsi rate for a forward (17.5 * 0.7/3), and multiplies by what I’ve called the Regressed Historical iCSh%. That’s essentially a players historical iCSh% (for all seasons prior to and including the current season) regressed by adding in 430 league average Corsi attempts. There’s room to quibble about using all the historical data, but I think it gives a better estimate than limiting to just one year.

The next line is the expected goals scored by the other forwards while our player is on the ice. We start with the expected shots by a pair of league average linemates (17.5*0.7*2/3). Then we add-in the altruistic bonus. This isn’t easy though, as we have to divide the altruistic bonus amongst forwards and defenseman. The 0.7*2/3 term represents the shot attempts that would normally go to his linemates (since 70% of all Corsi attempts are taken by forwards). We then have to take the other 0.7*1/3 (which would go to him, but since he’s creating the shots he can’t take them) and divide them amongst his teammates again, which in this case means taking 70% of those 0.7*1/3 (or 0.7*1/3*0.7). We then take all those “shots created” and apply a league average forwards Corsi shooting rate to them (5.36%)

The last line, as you’ve likely figured out, is the expected goals scored by the defenseman when our player is on the ice. The analysis is similar to what we did above: there’s a baseline Corsi attempt (17.5*0.3), the base altruistic bonus (0.3 of the altruistic bonus) and the distribution effect that we outlined above (0.7*1/3*0.3). And all of those shot attempts get an average defenseman’s scoring rate of 1.86%.

For defenseman the analysis is similar so I won’t go into the details of the derivation, but here’s the formula for those curious:

xGF20(Defenseman) = (Individual Contribution + 17.5*0.3/2)*(Regressed Historical iCSh%) +

                                (17.5*0.7 + Altruistic Contribution * (0.7 + 0.3*1/2*0.7) * 5.36% +

                                (17.5*0.3*1/2 + Altruistic Contribution * (0.3*1/2 + 0.3*1/2*0.3) * 1.8%

This may seem a bit overcomplicated, but dividing the additional shot attempts up is very important due to the differences in shooting percentages between forwards and defensemen.

So now that I’ve bored most people half to death with the derivation, let’s look at the results and see if they actually make any sense intuitively. First, let’s look at the top 10 and bottom 10 seasons by forwards since 2007-2008:

Season Player xGF20 Season Player xGF20
20122013 CROSBY, SIDNEY 1.12 20102011 NICHOL, SCOTT 0.50
20072008 CROSBY, SIDNEY 1.09 20092010 ADAMS, CRAIG 0.49
20112012 CROSBY, SIDNEY 1.03 20092010 JOHNSON, RYAN 0.49
20092010 CROSBY, SIDNEY 1.00 20122013 HALL, ADAM 0.49
20122013 SEDIN, DANIEL 1.00 20092010 HORDICHUK, DARCY 0.49
20122013 EBERLE, JORDAN 1.00 20082009 ORR, COLTON 0.48
20122013 PARISE, ZACH 0.99 20082009 GORDON, BOYD 0.48
20102011 CROSBY, SIDNEY 0.99 20072008 SMITHSON, JERRED 0.47
20122013 KESSEL, PHIL 0.99 20112012 ADAMS, CRAIG 0.47
20082009 PARISE, ZACH 0.98 20102011 BETTS, BLAIR 0.46

Yeah, Sidney Crosby is pretty good. We also see Zach Parise up there twice, perhaps justifying the massive deal given to him by the Wild prior to the 2012-2013 season. In the bottom 10 we get the usual grab bag of grinders: most of those guys have other seasons that are fairly low down the scale as well.

And what about the defensemen you ask?

Season Player xGF20 Season Player xGF20
20102011 BYFUGLIEN, DUSTIN 0.93 20112012 FERENCE, ANDREW 0.61
20082009 GREEN, MIKE 0.92 20082009 LEACH, JAY 0.60
20122013 EHRHOFF, CHRISTIAN 0.91 20112012 O_BYRNE, RYAN 0.60
20112012 ENSTROM, TOBIAS 0.91 20122013 JACKMAN, BARRET 0.60
20082009 STREIT, MARK 0.91 20112012 ALBERTS, ANDREW 0.60
20082009 PITKANEN, JONI 0.90 20122013 LARSSON, ADAM 0.59
20102011 VISNOVSKY, LUBOMIR 0.90 20102011 REGEHR, ROBYN 0.59
20102011 PICARD, ALEXANDRE 0.90 20082009 REGEHR, ROBYN 0.58
20072008 LEOPOLD, JORDAN 0.90 20082009 SAUER, KURT 0.57
20092010 SCHLEMKO, DAVID 0.90 20082009 HJALMARSSON, NIKLAS 0.57

This list isn’t as clear cut: Dustin Byfuglien, Mike Green and Lubo Visnovsky are all obviously names, but Alexandre Picard and David Schlemko (small sample size, admittedly)? Not so much. Lubo Visnovsky is actually loved by this metric and has the highest average xGF20 of any defensemen with more than one qualifying season.

I should note that the range of our xGF20 metrics is actually quite a bit smaller than the range we observe in actual GF20 metrics. As an example, Sidney Crosby posted the top xGF20 in the league last year at 1.12, while his actual GF20 was 1.72 (good for 2nd behind Joffrey Lupul who only played 200 even strength minutes). The main cause of this discrepancy is the teammate effects that we remove from our xGF20 metric. When we calculate xGF20 we assume that a player is on the ice with teammates who shoot at league average rates all the time, which is of course never true. Crosby played more than 400 minutes last year with Chris Kunitz, who posted an 8.8% iCSh%, well above league average. This discrepancy is of course intentional: we want to know how a player did regardless of who his teammates are, and aren’t necessarily concerned with whether that number is exactly in line with his real-world performance.

So what do we know about our new stat in general? We can start by looking at simple descriptive statistics to get a sense as to what levels are good, and which are bad.

Position Mean Std. Dev.
F 0.75 0.09
D 0.75 0.06

The first thing we notice is that the mean value for both forwards and defensemen is nearly equal (they appear to be due to rounding but in actuality forwards are about 0.05 higher). This is, obviously, a good thing as we can’t really have forwards on-ice scoring rates being significantly higher than defensemen while still icing a standard 3F/2D lineup. I suspect that the small difference between the groups is likely due to a preference for high shooting % forwards vs. a relative indifference for defenseman, although I don’t have any data to back that up.

Looking at the standard deviation, we also see that amongst forwards we see a higher spread in the distribution of xGF20. This too makes sense intuitively from my point of view: defensemen tend to have less control over offensive results than forwards as they generate a smaller percentage of the offensive opportunities (Corsi events). In addition, they tend to play with a wider range of teammates than forwards (they’ll play with both grinders and goal-scorers) which will tend to even out their results.

So we know that our metric looks to provide reasonable results and passes the eyeball test-the players we see at the top and bottom are, broadly speaking, the players we expect to see. The next thing we need to look into is how repeatable our new metric is. When we looked at repeatability above, we looked at how well the 3-year period from 2007-2010 was able to predict the 3-year period from 2010-2013. But this isn’t really practical from a player evaluation point of view-we won’t always have 6 years of data to make judgements with. A better method is to look at how consistent our metric is year-over-year.

xGF20 Year-to-Year Comparison - Forwards

xGF20 Year-to-Year Comparison – Forwards

Looking at the forwards we see that the year-over-year correlation is actually quite strong at R = 0.71. This is in-line with the high degree of repeatability that we saw earlier with our 3 year forward metric correlations, so we haven’t actually “lost” any information with our new metric.

xGF20 Year-to-Year Comparison - Defensemen

xGF20 Year-to-Year Comparison – Defensemen

For defensemen we see a bit more regression towards the mean, with an overall year-over-year correlation of 0.41. While this isn’t a phenomenal number it does reinforce the notion that defensemen have a lot less control over offensive results.

While our new metric does appear to be a reasonably good predictor of future results, it isn’t without issues. First off, it assumes regression towards a mean that’s based only on position. While this might not be as much of an issue for forwards, I suspect it’s more of an issue for defenseman. The shots that players like Erik Karlsson and PK Subban are taking likely have a much different probability distribution than those of stay-at-home defenders like Marc Methot or Josh Gorges. Because of this, our model is very likely undervaluing heavy-offense defenseman, as it regresses them too far towards the wrong mean.

We’ve also only looked at one half of the equation (or less than that, if you count goaltending): none of this work looks at how well players play defense, or what the best way to measure that is. Coming up with a way to estimate xGA20 is a lot tougher, as there’s no easy way to isolate individual contributions when a player doesn’t have the puck. There are certainly ways to come up with a decent estimate but there are often effects beyond just a players teammates (system effects) that are harder to quantify.

There are also issues with using average shooting percentages for teammate’s shots. We may be giving “shot creators” less credit than they deserve, because they may be giving up their own shots for a teammate who’s a better natural shooter. But this is probably a small issue-the consistency in the altruistic and individual contribution metric suggests that this isn’t often the case.

xGF20 isn’t perfect and the numerous revised versions that I’m sure will be coming over the next few months won’t be either. But it is a statistic that does try to capture a bit more information about a player the context that he plays in-his teammates, his shooting ability, and his playmaking ability. And ultimately, the more information we can capture in a statistic and the better we’re able to isolate talent from luck and circumstance, the more likely we are to make good hockey judgements going forward.

Advertisements
Tagged with: , ,
Posted in Statistics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: