This post originally appeared on Hockey Graphs.
It’s the Olympics again, which means it’s time for everyone’s favorite activity: watching Canada underperform at ice-hockey! And while Hilary Knight breaking the hearts of Canadians is fun for everybody, the only thing that’s more fun is watching Hilary Knight break the hearts of Canadians while you have a statistical model that predicts each team’s likelihood of winning a medal! That’s right, Hockey Graphs is taking on the challenge of predicting the Women’s Olympic Hockey Tournament results.
While the US enters as the betting favorites having won 4 consecutive World Championships and the last three 4 Nations Cups, their archrivals from north of the border haven’t lost a game at the Olympics since the 1998 finals. Although pretty much everyone expects the big two to meet again in the finals, the US’ failure to make the finals in the 2006 winter games offers proof that nothing is guaranteed in a short tournament.
Ideally, we’d use every Olympian’s past statistical performance to estimate each team’s overall strength; however, because of the scarcity of historical women’s hockey data available online, the model that we’ll build won’t be as detailed as we’d like it to be. We won’t use any individual-level data, which means the model has no idea that Alex Carpenter and Megan Bozek won’t be joining the US squad in PyeongChang this year.
Instead, we’ll build our model using past international results from the Women’s World Championships since 2009, Olympic results since 2002, and the International Friendlies results tracked by scoreboard.com since 2012. We’ll take the regulation outcomes of each game and use those to predict the goals for and against each team based on their past offensive and defensive performance, who they’ve played and whether they’re the home (or host) team.
We’ll use a mixed effects model which will shrink the team offensive and defensive factors towards the mean and also help deal with the smaller sample sizes we have for some countries. In order to account for each nation’s changing strength and roster over time, we’ll weigh recent results more heavily and reduce the weight for friendly competitions (which may not have featured the best possible rosters each country could’ve put together). We’ll decrease the weight of past games by 70% for each year since they occurred and weight friendlies as being 50% as important as World Cup/Olympic games.
So who does our model like this year?
Micah Blake McCurdy (@IneffectiveMath) February 09, 2018
The Americans enter the tournaments as favorites, with a 52% chance of taking home the Gold, followed closely by Team Canada at 45%. The US are actually favored to beat Canada 56% of the time in a head-to-head matchup on neutral ice, but their odds are taken down slightly by likely having to play Finland (who our model thinks are the third strongest club) in the semi-finals.
While the model has clear favorites in Division A, Division B is more of a toss-up, with each team having at least a 43% chance of advancing on to the quarter finals. The wildcard may be Korea: with little data on how the unified team will play, the model is assuming that each country’s individual results will be a good proxy for how they’ll perform in the future. This may be understating their actual ability, however, as the combined team will likely be stronger than either individual squad.
One neat part of our model is that the team strengths will continue to update as the games are played. This means that as the model gets more details about who is playing well this tournament, our estimates for the win probabilities in the games left to be played will also update.
Want to follow along with the fun? Hockey-Graphs alumnus and “Scotch from a shot glass” enthusiast Micah Blake McCurdy will be tweeting out updated predictions before and after every game.
 It should really just be called the Olympic Hockey Tournament, since we all know the Men’s tournament is going to be a snooze.
 These weights were determined by training the model using a range of weights on all games before the 2014 Olympics, and finding the weights that produced the lowest logloss on games after the 2014 Olympics.