Predicting Free Agent Salaries

With the NHL draft behind us, attention across the hockey world has turned to July 1st, when unrestricted free agents will be free to sign with the team that will give them the best shot at winning a cup the highest bidder, and fans of all franchises pray that their GM doesn’t screw anything up for a superstar to vault them into contender status. And while every deal signed will be scrutinized from a million different angles to determine whether or not a team paid fair money, most of this discussion will end up conflating the ideas of value (will Martin St. Louis still be putting up 20+ goals at age 42?) and price (should the Bruins, err Flames, really pay Dougie Hamilton $5.5MM per year?).

This, unfortunately, is a rather large mistake, because what teams pay for a player and what that player is actually worth to a team are two critical yet extremely different questions. Worth (or value) is what a player adds to a team on the ice (in goals or wins), and must be measured based on a player’s total contribution (see, for example, WAR on Ice’s Wins Above Replacement, Hockey Reference’s Point Shares, or Hockey Prospectus’ Goals Versus Threshold). Price, on the other hand, is simply what a team is willing to pay for a player, in contract dollars or cap hit. While ideally these numbers would align, the reality is that teams often value observed historical results more than they should. Teams tend to pay for basic counting stats while ignoring other potentially useful indicators of future success (such as general shot generation or prevention) or contextual factors (such as quality of teammates).

This market inefficiency presents an opportunity for teams, as GMs that can identify which players are over or underpaid relative to their actual contribution should have a long-term advantage in a cap-restricted world. The key to taking advantage of these opportunities, however, is being able to predict what your opponents are going to do. If a general manager knows what the rest of the league will pay a player, this information can be used to help identify potential targets in free agency before the negotiating period begins. Given that marquee free agents can often sign within hours of hitting the open market, this knowledge can help teams avoid chasing after players they know will be out of their budget. On the other hand, it can also help GMs know when they can play hard ball with their own players and refuse to submit to offers that are above what the market is likely to pay. All of which is to say that there’s a lot of value in being able to guess what the other 29 clubs in the league will be willing to pay a free agent, which brings us to the main question we’ll look at in this article: can we build a model to predict how much a free-agent will end up signing for, based on his historical stats?

To ensure that our predictions hold up over time, there are a few key things we need to take into account before building our model. First, we need to recognize that player salaries generally increase over time, so trying to guess a player’s raw cap hit is likely to provide us with a less precise estimate than we’d like. Instead, we’ll try to predict each player’s salary as a percentage of the cap in the year that they signed the deal. This should allow for proper comparisons across free agency periods, and will hopefully ensure our model remains valid when predicting in future years.

Second, we need to look at forwards and defencemen separately – a goal for a forward is obviously worth less than a goal for a defencemen, so modelling them together doesn’t make intuitive sense. Further, we’re not including goalies, because goalies are voodoo (but really, it’s a different dataset with a lot fewer data points to base a model on).

Third, rather than looking at rate stats, which we’d use if we were trying to calculate a player’s value, we’ll instead use aggregate totals over the three years before a contract kicked in. This is beneficial for two reasons. Primarily, it implicitly takes time-on-ice into account, so the “coach’s view” gets built into the model alongside performance. Second, it’s also a good approach because it includes injuries and benchings, both factors that would lead teams to reduce their offers to a player.

Lastly, and perhaps most importantly, we’ll focus on contracts that include at least 1 year of a player’s UFA years (in other words, contracts that cover seasons where the player is 27 or older, for simplicity’s sake). This should give us a better sense of a player’s free market cost, and will allow us to ignore the complexities that come into play when dealing with restricted free agents.

So which variables should we include when we build our model? We ideally want to use indicators that are reflective of what NHL General Manager’s look at when they make their offers, so if our hypothesis that they focus primarily on counting stats is true (which we can check by looking at our model fit), we should include mostly basic, non-advanced/enhanced/fancy stats (for lack of a better term). After a little bit of playing around, the variables that end up in the final model are:

  • Games Played
  • 5v5 Goals
  • 5v5 Assists
  • Powerplay Points
  • Shorthanded TOI %
  • Age (Contract Start Year)

To calculate the coefficients for each variable, we’ll use a simple linear regression between the % of cap hit (all data from WAR On Ice) in the contract start year and each player’s totals over the past three seasons (with adjustments made for the lockout shortened 2012-2013 year). While the model variables are the same for both forwards and defencemen, the premium placed on each variable differs by position.

Variable  Coefficient (Forward) Coefficient (Defence)
Intercept 6.567% 4.617%
Games Played -0.011% -0.003%
5v5 Goals 0.076% 0.078%
5v5 Assists 0.043% 0.078%
Powerplay Points 0.051% 0.062%
Shorthanded TOI % 2.140% 6.053%
Age -0.186% -0.174%

There are a few things to note with the coefficients: first, the directions of most coefficients are intuitive – as goals, points, shorthanded ice-time, etc. increase, salary increases, while as age increases, salary decreases. Second, as games played increases, predicted salary decreases. While this may seem illogical at first, the correct way to consider this stat is in relation to all the other variables: if we hold every other variable constant, and increase our total games played, it means that on a per game basis, our player’s stats have decreased, and thus we should likely decrease the amount we’d be willing to pay him.

With our model in hand, the next thing we need to look at is whether it’s actually a good predictor (in other words, is the fit any good?). After all, if all our predictions aren’t even in the ballpark of the actual numbers we can’t really act on the information generated by the model. As it turns out, our model is quite good at making a reasonable guess for most players. If we look at the predicted vs. actual cap hits for forwards in the graph below, we see that the fit of our model is actually fairly strong, with an R^2 of 0.76, meaning that the 6 variables in our model were able to explain roughly 76% of the variance in UFA cap hits.

Predicted vs. Actual Cap Hit (Forwards)

Predicted vs. Actual Cap Hit (Forwards)

For defencemen we see the same thing, with about 74% of the variance in actual cap hit explained by our 6 variables. While both models obviously have room for improvement, the degree to which we’re able to predict a player’s cap hit with such a simple model is striking, especially given that we made no adjustments for the league minimum salary (you may have noticed that negative salaries were predicted for several players in both graphs).

Predicted vs. Actual Cap Hit (Defencemen)

Predicted vs. Actual Cap Hit (Defencemen)

Even more encouraging is the fact that our method seems to work well out-of-sample as well. If we run the regressions on a subset of data covering contracts signed between 2008 and 2011, then use our new model to predict contracts from 2011 to 2014, we’re still able to reach nearly the same level of accuracy in our predictions: the correlation for forwards between predicted and actual cap hit is nearly unchanged at 0.85, while for defencemen it drops slightly to 0.80. In other words, the variables driving General Managers’ valuations have remained fairly constant over time, and we should expect our model to make good predictions going forward.

So who does our model think is going to get paid this offseason? The top 10 forwards and defencemen in predicted cap hit are given in the table below:

Player Expected Cap Hit Name Expected Cap Hit
Martin.St..Louis  $    5,381,581.99 Cody.Franson  $    6,092,462.37
Mike.Ribeiro  $    4,774,016.06 Mike.Green  $    5,689,853.73
Jiri.Tlusty  $    4,049,089.71 Andrej.Sekera  $    5,308,534.90
Chris.Stewart  $    3,837,661.77 Francois.Beauchemin  $    4,315,901.62
Michael.Frolik  $    3,709,481.43 Paul.Martin  $    4,107,451.82
Drew.Stafford  $    3,746,127.25 Christian.Ehrhoff  $    4,021,434.41
Mike.Fisher  $    3,740,451.85 Kimmo.Timonen  $    3,169,007.37
Justin.Williams  $    3,743,950.24 Marek.Zidlicky  $    3,121,393.79
Brad.Richards  $    3,706,817.45 Chris.Butler  $    3,048,384.63
Matt.Beleskey  $    3,256,725.66 Johnny.Oduya  $    2,888,769.09

Ex-Maple Leaf Cody Franson looks to be the defencemen most likely to be cashing in on July 1, with his predicted cap hit of $6.1MM topping the charts for both forwards and blueliners. Martin St. Louis leads all forwards at roughly $5.4MM, coming in more than half a million above the #2 forward, former Predator Mike Ribeiro (who’s reported demands for a deal north of 6MM/year aren’t really in line with historical precedent). A full list of all pending UFA forwards and defencemen is available here, including upper and lower estimates for each prediction to give a better sense of the range of reasonable salaries.

While our model is able to make reasonably good predictions, there are likely other factors that influence a player’s cap hit that we aren’t taking into account. Information on whether a player is resigning with his current club or moving to a new team would likely help us make better predictions, as would information on whether the contract was signed when the player was a UFA or RFA, although both pieces of data are slightly more complicated to compute. As well, using a different age metric may provide additional information, as the relationship between age and contract value may not be linear.

Another interesting element to consider would be team or GM variables, which might allow us to identify clubs whose player valuation techniques differ from the prevailing market patterns. Again, this would be complicated to implement, but could potentially reveal clues as to the decision making process for a given franchise or executive.

While each of these enhancements would be interesting to implement, ultimately, they’d be subtle refinements to a methodology that already fares pretty well in predicting player salaries. The goal here isn’t to provide a precise value every time, but to estimate a ballpark for what the market has traditionally paid a player, which could help management avoid  overpaying or lowballing a potential acquisition. Dodging such errors is one of the key areas that analytics can contribute to for teams today – if clubs can avoid overpaying one free agent every few seasons, it should provide a huge long term advantage in terms of added cap flexibility, and the resources available to retain players developed from within the organization. Similarly, teams that are able to identify talent that the market has undervalued should be able to add key pieces to their roster without sacrificing flexibility. In either case, knowing what the market will do before it actually happens could potentially allow clubs to refine their approach to free agency, and to end up with a roster that’s in good shape both with respect to the cap and on the ice.

Tagged with: ,
Posted in Free Agency, Predictions
9 comments on “Predicting Free Agent Salaries
  1. 94orbust says:

    Interesting article. Did you use some sort of model selection process to arrive at the explanatory variables included in the models?

    • Matt Cane says:

      To be honest, it was more trial and error. I had included some “advanced” stats (CF%, CF60, CF-Rel) but they ended up being insignificant. Given that the model had a pretty good fit to begin with I didn’t push it all that much.

  2. dogtenberg says:

    sick stuff

    im not sure if your dataset will allow this, but did you try adding the year they were signed in? this should account for inflation the market and improve your correlation

    • Matt Cane says:

      I kind of included this – I used the % of cap in the contract year to make sure that the inflation was taken into account. Looking at the early signings I think it might be useful to add in something about the quality of the FA class in a given year, but that might be for next year’s model.

  3. […] Matt Cane of Puck++ used historical data to predict UFA salaries. […]

  4. […] don’t know what the exact figure should be, but based on a recent study which weights key metrics from the past three seasons and converts it into a…, Semin’s predicted market value is $3.5 million for the 2015-16 season. You can see the Twitter […]

  5. […] don’t know what the exact figure should be, but based on a recent study which weights key metrics from the past three seasons and converts it into a…, Semin’s predicted market value is $3.5 million for the 2015-16 season. You can see the Twitter […]

  6. […] been made in the past to develop salary prediction models – notably, Matt Cane’s outlined here. Cane uses linear regression to model awarded AAV as a function of various stats belonging to […]

  7. […] too unreliable). There are 2 separate models for forwards and defensemen (read Matt’s post here for an […]

Leave a comment