Friday, 25 November 2016

Final Table Predictions for the EPL

In a previous post I looked at how the EPL league table evolves over a season, showing that we already have a decent idea of how the final league table will look after just a third of the season.

I’ve now taken that analysis a step further and built a simple model for predicting the total number of points each team will accumulate over the season (and therefore their final rankings). What follows is a short summary of how the model works; I've provided more technical detail at the end.

Season simulations

Each team starts with their current points total. I then work my way through the fixture schedule (currently 260 matches), simulating the outcome of each game. Results are generated based on the Elo rankings of each team – which I update after each simulated match – and the benefits of home advantage (scroll down to the last section for more details). At the end of the ‘season’, I tally up the final points totals for each team.

This process is repeated 10,000 times to evaluate the range of points that each team ends up on; I then make a final league table based on their averages. The probability of each team finishing the season as champions, in the top four or bottom three is calculated based on the frequency at which it occurs within the 10,000 runs.

Final table predictions 

Using all the results to date, the projected EPL table looks like this.

The box plots indicate the distribution of each team's points totals over the 10,000 simulated seasons. The green bars indicate the 25th to 75th percentiles and the dashed lines (‘whiskers’) the 5th to 95th percentiles. For example, in 50% of the simulations Man City finish on between 71 and 81 points and in 90% of the simulations they accumulate between 63 and 89 points. The vertical line in the middle of the green bars shows the median[1]. The numbers to the right of the plot show the probability of each team: 
a) winning the title (Ti);
b) finishing in the champions league spots (CL);
c) being relegated (rel).

You can see that the table is bunched into three groups: those with a decent chance of making it into the champions league, the solidly mid-table teams and the remainder at the bottom. Let’s look at each group in turn.

Top Group: This group contains Man City, Chelsea, Liverpool, Arsenal, Spurs and, if we’re being generous, Man United. These are the teams with a fighting chance of finishing in the top four. City, Chelsea, Liverpool and Arsenal are so tightly bunched they are basically indistinguishable: you can’t really predict which of them will win the league. However, there is a 93% probability that it’ll be one of those four. Spurs go on to be champions on only 6% of the simulations and United in less than 1%. Indeed, United finish in the top four only 17% of the time – roughly a 1 in 6 chance.

Middle Group: This group includes Southampton, Leicester, Everton, Watford and West Brom. The distribution of their points totals indicate that they are likely to collect more than 40 points, but less than 60. That makes them reasonably safe from relegation but unlikely to finish in the top four (last season, the 4th placed team – Man City – finished with 66 points). They can afford to really focus on the cup competitions (and for Leicester, the champions league).

Bottom Group: Finally, we have the remaining nine teams, from Stoke down to Hull. According to my simulations, these teams have at least a 10% chance of being relegated. The bottom 5 in particular collect less than 40 points on average and are relegated in at least a third of the simulations, with Sunderland and Hull going down more often than not. 

Next Steps

My plan is to update this table after each round of EPL games (which you can find here). Hopefully, we should see the table beginning to crystallize as the season progresses, with the range of points totals narrowing and thus the final league positions becoming easier to predict.

There is also plenty of information that could be added. The simulations know nothing about injuries and suspensions, future transfers, managerial changes and grudge matches. They also do not take into account fixture congestion and cup participation. I’m going to investigate some of these issues and incorporate anything that reliably adds new predictive information.


Specific Model Details

This section takes a look at what is going on under the hood in a bit more detail.

The core of the calculation is the method for simulating match outcomes. For each match, the number of goals scored by a team is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:

There are two predictors in the model: X1 = ΔElo/400, the difference between the team's Elo score and their opponents', and X2 is a binary home/away indictor equal to 1 for the home team and -1 for the away team. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from; after each simulated fixture the Elo scores are updated using the procedure described here.

The beta coefficients are determined via linear regression using all matches for the seasons 2011/12 to 2015/16, obtaining values β1 = 0.26, β2 = 0.71, β3 = 0.13. All are highly significant, as is the change in deviance relative to an intercept-only model. Running the regression on earlier seasons obtains similar results. 

How good are the match predictions?

A good way of answering this question is to compare the match outcome forecasts generated by this model with the probabilities implied by bookmaker's betting odds. There are a number of different metrics you can use to compare forecast accuracy, I’ve chosen two: the Brier score and the geometric mean of the probabilities of the actual match outcomes. It turns out the Poisson model and the bookies do equally well: they have identical scores for both metrics (0.61 for the Brier score and 0.36 for the average probability - consistent with what this analysis found).

The plot below shows that there is a strong relationship between the predicted probability of home wins, away wins and draws for the EightyFivePoints model and the bookmaker’s forecasts (note that I've 'renormalised' the bookmaker's odds such that the outcome probabilities sum to 1 for any given match). This makes me think that they’re doing something quite similar, with a few extra bells and whistles.

Comparison of probabilities assigned to ‘home win’, ‘away win’ and ‘draw’ by the Poisson model and those implied by bookmakers odds. All EPL matches from the 2011/12 to 2015/16 seasons are plotted.

One stand out feature is that draws are never the favoured outcome. This suggests that one of the keys to improving the accuracy of match outcome predictions is to better identify when draws are the most likely outcome. After all, more than a quarter of games end in draws.

[1] Which happens to be close to the mean, so there isn’t much skew.

Saturday, 12 November 2016

Elo Impact: Who are the EPL’s most effective managers?

Manager rivalry is one of the big themes of the season. Many of Europe’s most successful managers have converged on the EPL, sparking renewed and fierce competition between England’s biggest clubs as they battle on the pitch to achieve domestic superiority.  In the background there is another competition, one of a more individual nature. Guardiola, Mourinho, Conte and Klopp are seeking to establish themselves as the pre-eminent manager of their generation. As touchline galacticos, their rivalry mirrors that of Europe’s top players.

Success is often measured relative to expectation. Second place this season would probably be seen as a good finish for Liverpool, but not Man City. So Klopp and Guardiola will be judged against different standards. If Moyes guides Sunderland to a top ten finish he’ll win manager of the season.

For the same reason, it’s difficult to compare their track records. A manager may have won an armful of medals, but was it the result of years of sustained improvement or a few tweaks to an already excellent team? Can we compare the achievements of Wenger and Pulis, or Ferguson at Aberdeen and Ferguson at Man United?

To answer these questions we need an objective method for comparing the track records of managers over their careers. Not a count of the big cups in their cabinets, but a consistent and transferable measure of how much they actually improved their teams. In this post I’m going to lay out a simple method for measuring the impact managers have made at their clubs. I’ll then use it to compare the careers of some of the EPL’s current crop of talent.

Elo Scores

There is one measure of success that is applicable to all managers: to increase the number of games the team wins. The problem is that it is not easily comparable over time: a manager can move from a small club to a big club, or one league to another, and his win percentage will vary irrespective of the impact he had on each team.  However, there is a neat way of circumventing these issues, and that is to use the Elo score system.

Created by physicist Arpad Elo for ranking chess players, the Elo system has now been applied to a number of different sports, including the NFL and international football teams. The excellent site has adapted it for European club football. You can find all the details there, but here are the essentials: each team has an Elo score which varies over time as they win, draw or lose matches. The difference in scores between two teams is directly related to the probability of each team winning in a direct confrontation.

For example, Man United currently have an Elo score of 1778 and Barcelona 2013; the difference is 235 and under the Elo system this implies that Barcelona would have an 80% chance of winning the game (if played at a neutral venue). The full details of this calculation can be found here.

After two teams have played they will exchange points, with the exact amount being dependent on two things: the difference in their Elo scores before the game, and the outcome. For example, last weekend Man City drew 1-1 with Middlesbrough. As City were expected to win the game Middlesbrough gained 7.5 points and City lost the same number.

So how do we apply the Elo system to measure manager impact?

Manager Impact

We can assess the impact a manger has made by simply tracking the changes to the club’s Elo score since he took charge. I’ll refer to this as the manager’s Elo Impact. The neat part is that we can consistently monitor a manager’s record across multiple clubs by simply summing up all the changes to Elo scores over his career. Unlike win percentage, this works because the numbers of Elo points a team gains for a win is dependent on how superior they are relative to their opponent: in the Bundesliga, Bayern Munich receive far fewer points per win than Darmstadt 98.

Let’s look at a couple of examples. The two figures below show the Elo Impact of two managers across their careers: Alex Ferguson and Jose Mourinho (similar plots for Wenger, Guardiola, Klopp and Conte can be found here). For each manager, I’ve only included periods spent at UEFA clubs (omitting Wenger’s time in Japan, for example) and at clubs in the top two divisions of each country.

Figure 1 starts in 1978, when Alex Ferguson took over at Aberdeen, and ends with his retirement in 2013. The red line tracks the cumulative sum of the changes to his Elo score, bridging his move from Aberdeen to Manchester United in 1986.

Figure 1: the Elo Impact of Sir Alex Ferguson from 1978.

The first thing that strikes me is that his peak at Aberdeen – the 1983-84 season, when he won the Scottish league and European cup-winners cup – is almost level with his peak at Man United manager (his second Champions League and 10th EPL title in 2008). This implies that Ferguson’s impact at Aberdeen and United are comparable achievements. That’s not an unreasonable statement: Ferguson won 3 of Aberdeen’s total of four Scottish titles and is still the last manager to break the Old Firm hegemony. 
The striking thing about Mourinho’s Elo Impact (Figure 2) is that it is so much less volatile that Ferguson’s. Yes, the axis range is broader – Mourinho has had a lot of success in his career and his peak impact (at around 500) is substantially higher than Ferguson’s – but a quick estimate shows that Ferguson’s score fluctuates about 30% more. On closer inspection, this might be because Ferguson’s teams tended to win more of the big games but lose more frequently to weak teams than Mourinho’s (at least, until recently). However, this needs further investigation.

Figure 2: the Elo Impact of Jose Mourinho from 2004.

It’s worth emphasizing that the Elo score does not go up simply because trophies have been won, it does so if the team improves relatives to its peers. Jose Mourinho’s time at Inter is a good example of this. Despite winning the treble in his final season in 2010, Mourinho departed Inter having made little improved to their Elo score. This is because Inter were already the dominant force in Italy when he arrived, having won Serie A in each of the preceding three seasons. Put simply, it’s difficult to significantly improve the Elo score of a team that is already at the top. Guardiola’s time at Bayern Munich is another example.[2]

Who are the most effective managers in the EPL?

We can also use Elo Impact to rank managers. There is a question of how best to do this: by total impact (latest score), average impact over the career (score divided by total number of years in management), or by score this season. I’ve decided to provide all three, but have ranked managers by their total impact. The results are shown in the table below.

Total, average (per year) and 16/17 season Elo Impact scores for current EPL managers.

The top 6 are pretty much what you’d expect, with one very notable exception. Tony Pulis, who has never actually won a major trophy as a manager, leads the table. This is not crazy: Pulis has improved the standing of every major club that he managed (a plot of his career Elo Impact can be found here). In particular, over his two stints as Stoke City manager, he took them from a relegation threatened Championship team to an establish mid-table EPL team. 

I think that the example of Tony Pulis demonstrates one of the strengths of the Elo Impact metric – it is fairly agnostic as to where a team finishes in the league, so long as the team has improved. While we are naturally attracted to big shiny silver cups, some of the best work is being done at the smaller clubs. I fully acknowledge that repeatedly saving teams from relegation requires a very different managerial skillset to developing a new philosophy of football at one world’s most famous clubs; the point is that Elo Impact at least allows you to put two very different achievements on a similar footing. It’s a results-based metric and cares little for style.[1]

Guardiola is perhaps lower than some might expect, but then he only had a small impact on Bayern Munich’s Elo score during his tenure. A few successful seasons at City and he’ll probably be near the top of this table. Why is Wenger’s average impact so low? As this plot shows, he substantially improved Arsenal during the first half of his tenure, but has essentially flat-lined since the ‘invincibles’ season. Further down the table, Bilic's score has fallen substantially this season as West Ham have had a disappointing campaign so far. 

So what now?

I intend to develop Elo Impact scores for two purposes. First, I’ll track each manager’s scores over the EPL season to track who has had overseen the greatest improvement in their side. I’m happy to provide manager rankings for other leagues or individual clubs on request.  Second, as new managers arrive, I’ll look at their Elo track record to gain an insight on whether they’re likely to be a be success or not. 

It's going to be fascinating to see which manager comes out on top this season.


Thanks to David Shaw for comments.

[1] Although you do gain/lose more points for big victories/losses.
[2] It is difficult to improve, or even just maintain, a team's Elo score once it rises above 2000. Few points are gained for winnings games and many are lost for losing them. Basically, the team is already at (or near) the pinacle of European football. For this reason I've made a slight correction to the Elo Impact measure: when a club's Elo score is greater than 2000 points, I've set the maximum decrease in a manager's Elo Impact to 10 points per game. Once the club's score drops below 2000, the normal rules apply.

Tuesday, 1 November 2016

Wenger's Winter Curse

Halloween may have passed but Arsenal's fans will remain fearful throughout November. This is the month where, historically, Wenger's team have tended to perform significantly below par. Since Wenger took charge in 1997, Arsenal have collected an average of 1.6 points per game in November, compared to a season average of 2 points per game.

In fact, as the figure below demonstrates, Arsenal don't really recover until mid-December. The thin blue line shows the average number of points that Wenger's Arsenal collect in each gameweek of the season; the dashed blue line shows a 3-game moving average. The Nov/Dec curse is clearly visible[1].

For comparison, I've also plotted the same results for Man United under Ferguson. For both teams, I used data from the seasons 97/98-12/13, the period in which the two managers overlap.

Average number of points collected by Arsenal (blue) and Man United (red) over the seasons 97/98-12/13. Solid lines show the average for each game week, dashed lines show a 3-match moving average.

It's interesting to compare the seasonal performance of the two managers. In the first and final thirds of the season, Wenger's points-per-game closely matches Ferguson's. However, while Ferguson's teams would step up their performance in December (perhaps after the group stage of the Champions League finished), Wenger's seem to struggle in early winter before improving in February.

I have no idea what causes Arsenal's end-of-year blips: injuries, Champions League involvement, fear of the English winter, or excessive bad luck? Whatever it is, we'll all be watching with interest to see if they can overcome it this year.

[1] And significant, in the statistical sense.