It’s finally here, the most important day of the year in Canada, World Junior Hockey Day! While the tournament may not have the same appeal in other areas of the World (or any appeal, in some cases), for Canadians the World Junior tournament is serious business, which makes the fact that Canada hasn’t won since 2009 a bit of a sore spot. While Canadians always enter the tournament with an often warranted air of confidence, predicting what will happen in the Juniors is tough given the lack of information we have to work with about the teams and players. While most analysts will be aware of the top level talent on each squad, it’s difficult to know how a 3rd liner toiling away in the German second division compares to a player in the Swiss Jr. A league. We can say with a reasonable degree of confidence that the Canadians will be better than the Swiss, but how much better can be a tough question to answer. To take a more rigorous approach, we have to dig into the data and adjust for varying league strengths, because that German second division player may be just as good as the guy manning the point for Canada’s first powerplay unit. With that in mind, let’s get started with the first ever set of Puck++ World Junior Predictions.
All of the data that we’re going to use was downloaded from Elite Prospects, where they’ve compiled each team’s roster for the tournament, and where they also have high level stats (G, A, GP) for each player broken down by team/league/season. While the current rosters aren’t 100% up to date (Germany, for example, still has 24 skaters listed on their team page as of December 24th) they should provide a reasonable basis to get us started with our predictions.
The building block for our model will be NHL Equivalencies (NHLe). NHLe are conversion factors first described by Gabe Desjardins as a means to estimate how well a non-NHL player would fare if he were dropped into the NHL at any given point in time. While NHLes are obviously more of a broad estimate than exact science they’re extremely useful in tournaments like these where the players are coming from all over the world and we don’t have any data from head-to-head competitions to use.
While Gabe described NHLes for most of the major minor and professional leagues, for this tournament we’ll have to come up with a few of our own to supplement his list. The basic rule we’ll use is that all second division leagues will be 44% of their parent divisions NHLe (aligning with the AHL being 44% of the NHL), while all junior leagues will be 29% of their respective National Leagues (in line with the CHL being 29% of the NHL). This puts a league the Swedish Allsvenskan (second division) at 0.34 (or right between the CHL and AHL), while the SuperElit (Swedish Jr. League) has an NHLe of 0.23.
There are 2 National Leagues which we’ll have to make estimates for, as neither the Slovak nor Danish first divisions have direct NHLes or any means to imply them by. For these leagues we’ll assume that they’re 75% of the lowest ranked European National league (the Swiss National League at 0.44), putting them at 0.33. We’ll also ignore all international results or tournaments to make our analysis easier.
With NHLes available for each league represented in the tournament, we can calculate an NHLe/GP for each player in the tournament using data from the 2013/14 and 2014/15 seasons to give us a bit more sample to use. We’ll then sort the players on each team by their NHLe/GP to give us an indirect estimate of their “lines”, and weight each player on the team based on which line they end up on. First line players (i.e. players ranked 1-5 on their team in NHLe/GP) will receive a weight of 1, with the weight decreasing by 80% for each line (2nd line = 0.8, 3rd line = 0.64, etc.). Weighting by NHLe/GP allows us to level out the playing field a bit, and prevents teams with weaker bottom halves from being underestimated in our model.
We then need to come up with a way to aggregate each team’s offensive and defensive abilities to estimate how well they’ll fare in a head-to-head matchup against any other team in the tournament.
To calculate each team’s offensive rating, I simply took each team’s total weighted average NHLe/GP (wNHLe/GP) described above and divided it by the tournament total NHLe/GP. For example, Canada has a weighted average NHLe/GP of 0.33, while the total off all other teams wNHLe/GP was 1.69. Therefore Canada’s offensive rating is:
Canada Offensive Rating = 0.33/1.69 = 0.20.
The offensive ratings for each team are given in the table below:
On the defensive side, the aggregation is slightly more complicated, as we don’t have a direct estimate of defensive ability to use. Instead, we’ll take NHLe as a proxy for defensive skill (after all, if you’re scoring a lot odds are the puck is in the other team’s end a lot and you’re probably allowing fewer goals), but adjust slightly for the fact that defense is more of a team game and we expect to see less variance in defensive ability amongst teams.
We’ll start by calculating each teams Raw Defensive Rating, which will just be the inverse of their Weighted Average NHLe/GP, divided by the total inverse Weighted Average NHLe/GP for each team.
Raw Defensive Rating = (1 / (wNHLe/GP)) / Sum (1 / (wNHLe/GP))
We’ll then regress each team’s Raw Defensive Rating towards the mean to level out the playing field by making each team’s Defensive Rating 50% their raw rating and 50% the average raw rating.
Defensive Rating = 0.5 * Raw Defensive Rating + 0.5 Average Raw Defensive Rating
This will give us significantly less spread in our defensive metrics, and allow some of the less star-studded teams to be competitive in our model (as we’d expect).
To be honest, we’re not going to include goaltending in our analysis. While I was hoping to compute an estimated Save Percentage modifier for each team, I ended up throwing it aside for 2 reasons: 1) it’s a very short tournament, and predicting goaltending over only 7 games is mostly a futile exercise; and 2) there’s no easy method to compare goalies across leagues. While for skaters we have NHLe numbers that are widely available, there’s been no rigorous analysis to compute the same comparisons for goaltenders, and I didn’t want to involve too much new methodology in what’s supposed to be a simplistic prediction system.
Expected Win Percentage
To calculate each team’s expected neutral game winning percentage (i.e. their odds of winning a game against a team that’s a “true” 0.500 talent team) we’ll use their offensive and defensive ratings and a simplified Pythagorean win expectancy formula.
Expected Winning Percentage = (Off. Rating ^ 2) / (Off. Rating ^ 2 + Def. Rating ^ 2)
|Team||Expected Win Percentage vs. 0.500 Team|
We can use the Expected Win Percentage to calculate each team’s odds of beating any other team using the odds ratio:
Odds(Team A beat Team B) = Exp. Win % (A)/(1 – Exp. Win % (A)) * (1 – Exp. Win % (B)) / Exp Win % (B)
Which we can then easily translate into percentage form:
Expected Win % (Team A vs Team B) = Odds (Team A beat Team B) / (1 + Odds (Team A beat Team B))
So if Russia (61.1% Expected Win % vs. a 0.500 Team) plays Switzerland (32.8% Expected Win % vs. a 0.500 team) we’d expect Russia to win 76.3% of the time, while if they took on the US we expect them only to win 21.4% of the time.
Simulating the Tournament
Once we have each team’s Expected Winning Percentage calculated, we can calculate the odds of every team winning the whole tournament by simulating each game 1,000,000 times and calculating how often each team ends up in a given position.
Looking at our predictions, it appears to be the year of the North Americans, with Canada or the US winning the tournament roughly 92% of the time according to our model. This shouldn’t be surprising, as both teams feature a solid mix of first round picks from previous years and players expected to go high in this year’s draft. The North American squads also feature 12 of the top 15 players in NHLe/GP in the tournament, with future 1-2 picks Connor McDavid and Jack Eichel leading the way for their respective squads. Canada and the US are so heavily favoured in our model that we’d expect any other team in the tournament to appear in the final only 37% of the time if our predictions are correct.
Going further down the list, the Russians appear to be the only other squad with a non-negligible shot at the gold, coming in at just under 5% odds to win the whole tournament. The more likely results for the Russians, however, is the bronze medal where the model predicts them to land nearly 1 in 3 times. Last year’s champs Finland, and runners-up Sweden have less rosy outlooks this year, with the model giving the Finns only a 17% chance of medalling, and the Swedes coming in just higher at about 26%.
The two clubs that look most likely to risk relegation, Denmark and Slovakia, may actually have their odds slightly understated by the fact that we had to guesstimate their NHLes on our own without any data. The Danes in particular are an interesting squad, with Nikolaj Ehlers and Oliver Bjorkstrand placing in the top 10 in NHLe/GP in the tournament. If we move the Danish league up to the same level as the Swiss top division, the Danes actually appear to be the 5th best team at the tournament. While they’re still unlikely to medal, they could be a sleeper team to watch if they get some hot goaltending.
Here are each team’s odds for today’s games. Warning: If you’re a Slovak fan, it ain’t pretty:
|Visitor||Home||Visitor Win %||Home Win %|
Updating Our Predictions
As we move through the tournament, we’ll be able to use our initial predictions along with the results we observe to better predict each team’s talent. One way to do this is to convert our initial Expected Winning Percentages into Elo ratings and then update our ratings as the tournament is played. For example, if Canada were to lose to Slovakia in their first game of the tournament, our estimate of their single game Expected Winning Percentage would drop from 88.1% to 81.3%, which would make them underdogs to the US in the overall tournament. If they win, however, our rating remains effectively unchanged, reflecting the fact that Canada are huge favourites when playing the Slovaks.
I’m on keeping these predictions updated as the tournament continues – I’ve set the K-factor in the Elo ratings (the variable that describes how quickly the ratings change) pretty high since it’s such a small tournament and we’ll need the model to adjust quickly as we observe results. I’m also planning to go back and look at past tournaments to see how well the model has performed in the past. There may be biases in the data or methodology that we’ll be able to identify by going over previous year’s results. In particular, I’m worried that the methodology may be overestimating the odds of the North American teams – it may be that we really do have two truly dominant teams this year, but there’s always the possibility that players playing in Europe are getting less credit than they deserve. The other thing to keep in mind is that this is only a 7 game tournament and anything can happen, especially when you take the inherent randomness of goalies into account. While Canada may be the statistical favourite, there’s no way to predict when a goalie will shoot the puck into his own net off of his defenceman (obviously, a totally wild hypothetical example). Our stats may help us make better guesses, but they still won’t make it any easier for Canadians to watch their team battle in a one goal game for their first gold medal in 6 years.