Score Adjusted Weighted Shots

Last week, Tom Tango, Sabermetrician extraordinaire for the Chicago Cubs (and one of the original hockey analysts to, you know, actually make money doing this thing), posted an article on his site proposing a new metric to better weight the components of Corsi. Tango defined his new metric, (which he proposed with tongue planted firmly in cheek be called Tango, or failing that Weighted Shots), as a simple linear combination of goals and non-goal Corsi events:

wSH = Goals + 0.2*(Shots + Missed Shots + Blocked Shots)

The weighting of 0.2 was informed (although not strictly derived) from a regression between half-season Corsi components and half-season Goals For (i.e. calculating the weights to maximize the predictive value of future Goal Differential). Tango’s goal was to preserve the predictive information that we see in Corsi while properly taking into account the fact that goals are really what the game is all about. And while no analyst has ever argued that Corsi alone is enough to evaluate a team or player with, Tango’s point was that we had the data to make intuitive improvements to Corsi in a relatively easy manner.

One of the problems with how wSH is formulated though, is that it aimed behind the current state of Hockey Analytics. As Micah McCurdy has illustrated, Score Adjusted metrics vastly outperform standard possession metrics, since both the location of the game and the current score state have significant impacts on how teams perform. Unless we take score and venue effects into account, even an improved metric like wSH is missing important information. Fortunately, if we follow along with Micah’s original methodology, we can figure out the appropriate adjustments to bring these factors into wSH.

Using data from 2008-2014, we can first calculate the probability of a given team recording an event based on the event type, score state and game location (home/away):

Score Home Goal Away Goal Home Shot/Miss/Block Away Shot/Miss/Block
-3 52.87% 50.83% 59.25% 56.66%
-2 51.19% 50.49% 57.34% 55.31%
-1 53.20% 49.77% 54.96% 53.22%
0 52.91% 47.09% 50.95% 49.05%
1 50.23% 46.80% 46.78% 45.04%
2 49.51% 48.81% 44.69% 42.66%
3 49.17% 47.13% 43.34% 40.75%

Then, we can take the probabilities, along with Tango’s wSH weights (1 for goals, 0.2 for shots/misses/blocks) and combine them to calculate weighted adjustment factors for a Score Adjusted Weighted Shots metric (SAwSH):

Score Home Goal Weight Away Goal Weight Home Shot/Miss/Block Weight Away Shot/Miss/Block Weight
-3 0.943 0.983 0.163 0.173
-2 0.976 0.990 0.171 0.179
-1 0.936 1.005 0.180 0.187
0 0.942 1.058 0.196 0.204
1 0.995 1.064 0.213 0.220
2 1.010 1.024 0.221 0.229
3 1.017 1.057 0.227 0.237

As you can see, the value of a goal relative to a shot isn’t constant in our new method. It ranges from one goal being worth 5.78 shots/misses/blocks (for a home team down 3 goals) all the way down to 4.46 shots/misses/blocks per goal (for a visiting team up by 3).

Now that we’ve defined how to calculate SAwSH, let’s look at how well it performs compared to Score Adjusted Corsi. Whenever we evaluate a new stat there are two things we need to look at to decide how much trust to put in it: 1) the repeatability of the metric, that is how well our measurement over one period predicts the same measurement over another period; and 2) how well the metric predicts our result of interest (winning hockey games). A metric that’s not repeatable doesn’t do us much good when we’re evaluating a team or player, since we don’t know whether the results we observe are due to luck or talent. At the same time, a measure that’s highly repeatable but doesn’t relate to winning is a metric that we should just ignore.

The best way for us to test for repeatability at the team level is to look at the correlation between our results in odd-numbered games and in even-numbered games. Since there’s nothing in the game number that would relate to our results, if we see a high correlation it’s a good sign that what we’re observing is a talent.

Metric Correlation
Score Adjusted Corsi 0.873
Score Adjusted Weighted Shots 0.841

While Score Adjusted Corsi shows slightly more repeatability, the difference at this level is more or less negligible at this point. Both metrics show enough repeatability that we don’t have to worry that they’re influenced too heavily by luck. This is particularly important for SAwSH as it dispels one of the biggest worries that many people had about it, being that the inherent variableness in shooting and save percentage would mean that we’d need a much larger sample before we could trust the results.

If we move on to predictability we can run a similar test, but instead of correlating the same metric between even and odd-numbered games, we’ll look at how well our Score Adjusted numbers in even games predict a team’s Goals For Percentage in odd games (and vice-versa).

Metric Correlation (Even -> Odd GF%) Correlation (Odd -> Even GF%)
Score Adjusted Corsi 0.475 0.421
Score Adjusted Weighted Shots 0.495 0.446

In both datasets, SAwSH does a better job of predicting out of sample Goals For %. This makes sense of course, since SAwSH includes goal scoring/goaltending data where SAC doesn’t. The difference between SAC and SAwSH is also interesting to note: we seem to be able to explain ~5% more of the variance in out of sample GF% by using wSH rather than Corsi, illustrating the fact that shooting percentage and save percentage do matter at the team level. While they’re obviously not as important as possession (after all, we still do fairly well using only SAC), there’s clearly a benefit to including them our analyses.

While the computational cost of SAwSH may be slightly higher than standard CF, the benefits are more than just an increase in predictive power: wSH makes much more sense intuitively, and is a direct counterpoint to the argument that analytics are too focussed on possession. SAwSH makes a much better argument for analytics while giving up very little in the way of repeatability. While there are obviously further areas to investigate (the weightings in Tango’s original regression equations are worth a deeper look as there’s likely further value to be extracted there), SAwSH is clearly a step-forward for the analytics movement. And although some may argue that the power of SAwSH is a repudiation of Corsi as a metric, I instead look at it as a validation of possession-based analyses: the value of the sample that Corsi offers is obvious; SAwSH is just a small tweak to better reflect the inherent shooting and goaltending differences that Corsi can miss in some cases.

Addendum

On his site Tango asked for correlations for Score Adjusted Goals, and so I’m happy to oblige:

Comparison Correlation
Odd(SAG%)->Even(SAG%) 0.365
Odd(SAG%)->Even(GF%) 0.366
Even(SAG%)->Odd(GF%) 0.354

Obviously, SAwSH is quite a bit better in both repeatability and predictability, but what’s more interesting is how little additional value we get from adjusting GF% for Score State/Venue. The correlation between raw GF% in odd and even games is 0.353, which means that we’re getting almost no additional information from our adjustments.

Advertisements
Tagged with: , ,
Posted in Statistics
4 comments on “Score Adjusted Weighted Shots
  1. […] Fenwick and Score-Adjusted Goals instead, creating Score-Adjusted Weighted Shots. Matt Cane recently looked into this as well using Tango’s method so the difference here is excluding blocked shots and weighing non-goal […]

  2. […] (Oct. 10/15) to their respective 2014/15 season series numbers. Specifically, I will compare Weighted Shots (WghSh%; 1 point for goals, 0.2 points for shot attempts); Shot Attempts (SAT%; blocked, missed, and shots on goal), Scoring Chances (SC%; as defined by […]

  3. […] of games, to the previous season’s series against a specific team. For metrics, I compare Weighted Shots (WghtSh%; 1 point for goals, 0.2 points for shot attempts); Shot Attempts (SAT%; blocked, missed, and shots on goal), Scoring Chances (SC%; defined by […]

  4. […] is by using the split-half regression technique most recently used in hockey by Matt Cane in his Weighted Shots model. There are other ways to perform this type of analysis but, as I mentioned above, they resemble […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: