Tuesday, 20 December 2016

Does January transfer spending improve results?

Last week the Sunderland chief executive, Martin Bain, warned that only "very limited" funds will be made available to David Moyes in the January transfer window (see here, here and here). Bain said that Sunderland are “not going to be able to spend to get out of trouble” and that "we have reached a point where there has to be a time where you don’t have that short-term hit to plug the holes in the dam".

The implication is that Sunderland have put their long-term financial health at risk in previous seasons by spending substantial sums in January in a last-ditch effort to retain their EPL status. While they have indeed survived their recent flirtations with relegation, is there any compelling evidence that winter spending actually improves results in the second half of the season? By out-spending their rivals, are troubled teams boosting their chances of staying up, or are they just using up previous financial resource that could be invested more carefully in their future? In this blog I’ll try to investigate these questions.

January spending and results improvement.

The goal is to establish whether there is any relationship between January transfer spending and an improvement in results in the latter half of the season. For each of the last six seasons, I calculated the gross January expenditure of every EPL team using data taken from transferleague.co.uk[1].  To measure the improvement in results for each team, I calculated the average number of points per game they collected in matches played either before or after January 1st in each season and took the difference (second half of the season minus the first).

Figure 1 below plots the change in points-per-game versus gross January expenditure for all EPL teams in each of the 2010/11 to the 2015/16 seasons (each point represents a team in one of those six seasons). On average, just under two thirds of EPL teams spent more than £1m in (disclosed) transfer fees in any given January window, with just over a third spending more than £5m and a fifth spending more than £10m. There are four clubs that spent more than £30m in January: Chelsea in 2010/11 and 2013/14, Liverpool in 2010/11 and Man United in 2013/14. The average change in points/game between the two halves of the season is close to zero[2] and there is no significant correlation with the level of spending.

Figure 1: Change in the average points-per-game measured before and after 1st January against total spending in the January transfer window for all EPL teams in each of the last six seasons. 

Not all teams will be looking for an immediate return on their investment in January. Some will be buying back-up to their first team or young players for the future. The teams that will certainly be looking for an immediate impact are those embroiled in the fight to remain in the EPL. In Figure 2 I’ve highlighted the relegation-threatened teams in each season. Specifically, this includes all teams that were in the bottom 6 positions in the table on January 1st, plus those that went on to be relegated at the end of the season (as you’d expect, most relegated teams were also in the bottom 6 in January)[3]. Teams that were relegated are coloured red; those that survived are blue. 

Figure 2: Change in the average points-per-game measured before and after 1st January against total spending in the January transfer window for all EPL teams (grey crosses) in each of the last six seasons. Teams marked by a square were in the bottom six of the table on 1st January; those in red were relegated, those in blue survived.
There are a couple of interesting things about this plot. First -- the majority of relegation-threatened teams see an improvement in their results in the second half of the season. I think this is just mean reversion: teams that underperform in the first half of the season are likely to do better in the second half. For example, over the last six seasons, teams in the bottom half of the table collected an average of 0.2 points/game more in the second half of the season than the first. The opposite is true of teams in the top half of the table: they tended to be an average of 0.2 points/game worse-off in the second half of the season. 

Second -- there is no significant correlation between spending and improvement in results for relegation-threatened teams. If we split them into two groups, those that spent greater than £5m in January and those that spent less, we find that 38% (6/16) of the high spenders and 55% (12/22) of the low spenders were relegated. This difference is probably not big enough to be significant. Raising the stakes higher – of the four relegation-threatened teams that spent more than £20m in January, three were relegated: Newcastle & Norwich last year, and QPR in 2012/13.

It seems reasonable to conclude that teams should resist the temptation to try to spend their way out of trouble: there is little evidence that it will pay off. It looks like Bain is being prudent in tightening the purse strings.


[1] Note that for some teams it will be an underestimate as the transfer fee was never disclosed.
[2] This doesn’t have to be the case. For instance, there could be more draws in the first or second half of the season.
[3] The results don't change significantly if we selected relegation-threatened teams as being those within a fixed number of points from the relegation zone.

Friday, 2 December 2016

Playing in Europe does affect domestic results in the EPL

There’s recently been a bit of discussion in the media (e.g: Sky, Guardian) on whether participation in European competitions has a negative impact on an EPL club’s domestic performance. This is partly motivated by the significant improvements shown by Liverpool and Chelsea this season: after 13 games they are 10 and 17 points better off than at the same stage last season, respectively. Neither are playing in Europe this year. Leicester are demonstrating a similar trait, albeit in the opposite direction: they are now 15 points worse off than last season. For them, the Champions League seems to have been a significant distraction.

Numerous studies have demonstrated that there is no ‘hangover’ effect (see here and here) from playing in Europe. There is no evidence that EPL teams consistently perform worse in league matches that immediately follow a midweek European fixture. But what about the longer-term impact? Perhaps the mental and physical exertion of playing against the best teams in Europe manifests itself gradually over a season, rather than in the immediate aftermath of European games. If this is the case, we should be able to relate variations in an EPL team’s points haul from season-to-season to the difference in the number of European fixtures it played.

It turns out that there is indeed evidence for a longer-term impact. The scatter plot below shows the difference in the number of European games played by EPL teams in successive seasons against the change in their final points total, over the last 10 seasons. Each point represents a single club over successive seasons. For instance, the right-most point shows Fulham FC from the 08/09 to 09/10 season: in 09/10 they played 15 games in the Europa cup (having not played in Europe in 08/09) and collected 7 fewer points in the EPL. Teams are only included in the plot if they played in European competitions in one or both of two successive seasons[1]. The green points indicate the results for this season relative to last (up to game week 13); the potential impact of European football (or lack of) on Chelsea, Liverpool, Southampton and Leicester is evident. Chelsea's league performance from 2014/15 to 2015/16 is a clear outlier: they played the same number of Champions League games but ended last season 37 points worse off.
Effect of participation in European competitions on a team's points total in the EPL over successive seasons. Green diamonds show the latest results for this season compared to the same stage last season. Blue dashed line shows results of a linear regression. 

The blue dashed line shows the results of a simple linear regression. Although the relationship is not particularly strong – the r-square statistic is 0.2 – it’s certainly statistically significant[2]. The slope coefficient of the regression implies that, for each extra game a team plays in the Europe, they can expect to lose half a point relative to the previous season. So, if a team plays 12 more games, it will be 6 points worse off (on average) than the previous season. 

It’s worth noting that the CIES Football Observatory performed a similar analysis in a comprehensive report on this topic published earlier this year.  They found there to be no relationship between domestic form and European participation over successive seasons. However, in their analysis they combined results from 15 different leagues across Europe. So perhaps the effect is more pronounced in the EPL than other leagues? This recent article in the Guardian, citing work by Omar Chaudhuri, suggests that the effects of playing in Europe may be more pronounced in highly competitive divisions. The lack of a winter break may also be a factor: while teams in Italy, Spain and Germany enjoy several weeks rest, EPL teams will play four league matches over the Christmas period. 

Finally, an obvious question is whether we are simply measuring the effects of playing more games across a season. To test this, we should apply the same analysis to progress in domestic cup competitions. However, I’ll leave that to the next blog.


[1]. The points along x=0 are teams that played the same number of European games in successive seasons (and did play in Europe both seasons). The only two teams that are omitted are Wigan and Birmingham City, both of whom played in the Europa League while in the Championship. Matches played in preliminary rounds are not counted.
[2] The null hypothesis of no correlation is resoundingly rejected.

Friday, 25 November 2016

Final Table Predictions for the EPL

In a previous post I looked at how the EPL league table evolves over a season, showing that we already have a decent idea of how the final league table will look after just a third of the season.

I’ve now taken that analysis a step further and built a simple model for predicting the total number of points each team will accumulate over the season (and therefore their final rankings). What follows is a short summary of how the model works; I've provided more technical detail at the end.

Season simulations

Each team starts with their current points total. I then work my way through the fixture schedule (currently 260 matches), simulating the outcome of each game. Results are generated based on the Elo rankings of each team – which I update after each simulated match – and the benefits of home advantage (scroll down to the last section for more details). At the end of the ‘season’, I tally up the final points totals for each team.

This process is repeated 10,000 times to evaluate the range of points that each team ends up on; I then make a final league table based on their averages. The probability of each team finishing the season as champions, in the top four or bottom three is calculated based on the frequency at which it occurs within the 10,000 runs.

Final table predictions 

Using all the results to date, the projected EPL table looks like this.

The box plots indicate the distribution of each team's points totals over the 10,000 simulated seasons. The green bars indicate the 25th to 75th percentiles and the dashed lines (‘whiskers’) the 5th to 95th percentiles. For example, in 50% of the simulations Man City finish on between 71 and 81 points and in 90% of the simulations they accumulate between 63 and 89 points. The vertical line in the middle of the green bars shows the median[1]. The numbers to the right of the plot show the probability of each team: 
a) winning the title (Ti);
b) finishing in the champions league spots (CL);
c) being relegated (rel).

You can see that the table is bunched into three groups: those with a decent chance of making it into the champions league, the solidly mid-table teams and the remainder at the bottom. Let’s look at each group in turn.

Top Group: This group contains Man City, Chelsea, Liverpool, Arsenal, Spurs and, if we’re being generous, Man United. These are the teams with a fighting chance of finishing in the top four. City, Chelsea, Liverpool and Arsenal are so tightly bunched they are basically indistinguishable: you can’t really predict which of them will win the league. However, there is a 93% probability that it’ll be one of those four. Spurs go on to be champions on only 6% of the simulations and United in less than 1%. Indeed, United finish in the top four only 17% of the time – roughly a 1 in 6 chance.

Middle Group: This group includes Southampton, Leicester, Everton, Watford and West Brom. The distribution of their points totals indicate that they are likely to collect more than 40 points, but less than 60. That makes them reasonably safe from relegation but unlikely to finish in the top four (last season, the 4th placed team – Man City – finished with 66 points). They can afford to really focus on the cup competitions (and for Leicester, the champions league).

Bottom Group: Finally, we have the remaining nine teams, from Stoke down to Hull. According to my simulations, these teams have at least a 10% chance of being relegated. The bottom 5 in particular collect less than 40 points on average and are relegated in at least a third of the simulations, with Sunderland and Hull going down more often than not. 

Next Steps

My plan is to update this table after each round of EPL games (which you can find here). Hopefully, we should see the table beginning to crystallize as the season progresses, with the range of points totals narrowing and thus the final league positions becoming easier to predict.

There is also plenty of information that could be added. The simulations know nothing about injuries and suspensions, future transfers, managerial changes and grudge matches. They also do not take into account fixture congestion and cup participation. I’m going to investigate some of these issues and incorporate anything that reliably adds new predictive information.


Specific Model Details

This section takes a look at what is going on under the hood in a bit more detail.

The core of the calculation is the method for simulating match outcomes. For each match, the number of goals scored by a team is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:

There are two predictors in the model: X1 = ΔElo/400, the difference between the team's Elo score and their opponents', and X2 is a binary home/away indictor equal to 1 for the home team and -1 for the away team. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from clubelo.com; after each simulated fixture the Elo scores are updated using the procedure described here.

The beta coefficients are determined via linear regression using all matches for the seasons 2011/12 to 2015/16, obtaining values β1 = 0.26, β2 = 0.71, β3 = 0.13. All are highly significant, as is the change in deviance relative to an intercept-only model. Running the regression on earlier seasons obtains similar results. 

How good are the match predictions?

A good way of answering this question is to compare the match outcome forecasts generated by this model with the probabilities implied by bookmaker's betting odds. There are a number of different metrics you can use to compare forecast accuracy, I’ve chosen two: the Brier score and the geometric mean of the probabilities of the actual match outcomes. It turns out the Poisson model and the bookies do equally well: they have identical scores for both metrics (0.61 for the Brier score and 0.36 for the average probability - consistent with what this analysis found).

The plot below shows that there is a strong relationship between the predicted probability of home wins, away wins and draws for the EightyFivePoints model and the bookmaker’s forecasts (note that I've 'renormalised' the bookmaker's odds such that the outcome probabilities sum to 1 for any given match). This makes me think that they’re doing something quite similar, with a few extra bells and whistles.

Comparison of probabilities assigned to ‘home win’, ‘away win’ and ‘draw’ by the Poisson model and those implied by bookmakers odds. All EPL matches from the 2011/12 to 2015/16 seasons are plotted.

One stand out feature is that draws are never the favoured outcome. This suggests that one of the keys to improving the accuracy of match outcome predictions is to better identify when draws are the most likely outcome. After all, more than a quarter of games end in draws.

[1] Which happens to be close to the mean, so there isn’t much skew.

Saturday, 12 November 2016

Elo Impact: Who are the EPL’s most effective managers?

Manager rivalry is one of the big themes of the season. Many of Europe’s most successful managers have converged on the EPL, sparking renewed and fierce competition between England’s biggest clubs as they battle on the pitch to achieve domestic superiority.  In the background there is another competition, one of a more individual nature. Guardiola, Mourinho, Conte and Klopp are seeking to establish themselves as the pre-eminent manager of their generation. As touchline galacticos, their rivalry mirrors that of Europe’s top players.

Success is often measured relative to expectation. Second place this season would probably be seen as a good finish for Liverpool, but not Man City. So Klopp and Guardiola will be judged against different standards. If Moyes guides Sunderland to a top ten finish he’ll win manager of the season.

For the same reason, it’s difficult to compare their track records. A manager may have won an armful of medals, but was it the result of years of sustained improvement or a few tweaks to an already excellent team? Can we compare the achievements of Wenger and Pulis, or Ferguson at Aberdeen and Ferguson at Man United?

To answer these questions we need an objective method for comparing the track records of managers over their careers. Not a count of the big cups in their cabinets, but a consistent and transferable measure of how much they actually improved their teams. In this post I’m going to lay out a simple method for measuring the impact managers have made at their clubs. I’ll then use it to compare the careers of some of the EPL’s current crop of talent.

Elo Scores

There is one measure of success that is applicable to all managers: to increase the number of games the team wins. The problem is that it is not easily comparable over time: a manager can move from a small club to a big club, or one league to another, and his win percentage will vary irrespective of the impact he had on each team.  However, there is a neat way of circumventing these issues, and that is to use the Elo score system.

Created by physicist Arpad Elo for ranking chess players, the Elo system has now been applied to a number of different sports, including the NFL and international football teams. The excellent site clubelo.com has adapted it for European club football. You can find all the details there, but here are the essentials: each team has an Elo score which varies over time as they win, draw or lose matches. The difference in scores between two teams is directly related to the probability of each team winning in a direct confrontation.

For example, Man United currently have an Elo score of 1778 and Barcelona 2013; the difference is 235 and under the Elo system this implies that Barcelona would have an 80% chance of winning the game (if played at a neutral venue). The full details of this calculation can be found here.

After two teams have played they will exchange points, with the exact amount being dependent on two things: the difference in their Elo scores before the game, and the outcome. For example, last weekend Man City drew 1-1 with Middlesbrough. As City were expected to win the game Middlesbrough gained 7.5 points and City lost the same number.

So how do we apply the Elo system to measure manager impact?

Manager Impact

We can assess the impact a manger has made by simply tracking the changes to the club’s Elo score since he took charge. I’ll refer to this as the manager’s Elo Impact. The neat part is that we can consistently monitor a manager’s record across multiple clubs by simply summing up all the changes to Elo scores over his career. Unlike win percentage, this works because the numbers of Elo points a team gains for a win is dependent on how superior they are relative to their opponent: in the Bundesliga, Bayern Munich receive far fewer points per win than Darmstadt 98.

Let’s look at a couple of examples. The two figures below show the Elo Impact of two managers across their careers: Alex Ferguson and Jose Mourinho (similar plots for Wenger, Guardiola, Klopp and Conte can be found here). For each manager, I’ve only included periods spent at UEFA clubs (omitting Wenger’s time in Japan, for example) and at clubs in the top two divisions of each country.

Figure 1 starts in 1978, when Alex Ferguson took over at Aberdeen, and ends with his retirement in 2013. The red line tracks the cumulative sum of the changes to his Elo score, bridging his move from Aberdeen to Manchester United in 1986.

Figure 1: the Elo Impact of Sir Alex Ferguson from 1978.

The first thing that strikes me is that his peak at Aberdeen – the 1983-84 season, when he won the Scottish league and European cup-winners cup – is almost level with his peak at Man United manager (his second Champions League and 10th EPL title in 2008). This implies that Ferguson’s impact at Aberdeen and United are comparable achievements. That’s not an unreasonable statement: Ferguson won 3 of Aberdeen’s total of four Scottish titles and is still the last manager to break the Old Firm hegemony. 
The striking thing about Mourinho’s Elo Impact (Figure 2) is that it is so much less volatile that Ferguson’s. Yes, the axis range is broader – Mourinho has had a lot of success in his career and his peak impact (at around 500) is substantially higher than Ferguson’s – but a quick estimate shows that Ferguson’s score fluctuates about 30% more. On closer inspection, this might be because Ferguson’s teams tended to win more of the big games but lose more frequently to weak teams than Mourinho’s (at least, until recently). However, this needs further investigation.

Figure 2: the Elo Impact of Jose Mourinho from 2004.

It’s worth emphasizing that the Elo score does not go up simply because trophies have been won, it does so if the team improves relatives to its peers. Jose Mourinho’s time at Inter is a good example of this. Despite winning the treble in his final season in 2010, Mourinho departed Inter having made little improved to their Elo score. This is because Inter were already the dominant force in Italy when he arrived, having won Serie A in each of the preceding three seasons. Put simply, it’s difficult to significantly improve the Elo score of a team that is already at the top. Guardiola’s time at Bayern Munich is another example.[2]

Who are the most effective managers in the EPL?

We can also use Elo Impact to rank managers. There is a question of how best to do this: by total impact (latest score), average impact over the career (score divided by total number of years in management), or by score this season. I’ve decided to provide all three, but have ranked managers by their total impact. The results are shown in the table below.

Total, average (per year) and 16/17 season Elo Impact scores for current EPL managers.

The top 6 are pretty much what you’d expect, with one very notable exception. Tony Pulis, who has never actually won a major trophy as a manager, leads the table. This is not crazy: Pulis has improved the standing of every major club that he managed (a plot of his career Elo Impact can be found here). In particular, over his two stints as Stoke City manager, he took them from a relegation threatened Championship team to an establish mid-table EPL team. 

I think that the example of Tony Pulis demonstrates one of the strengths of the Elo Impact metric – it is fairly agnostic as to where a team finishes in the league, so long as the team has improved. While we are naturally attracted to big shiny silver cups, some of the best work is being done at the smaller clubs. I fully acknowledge that repeatedly saving teams from relegation requires a very different managerial skillset to developing a new philosophy of football at one world’s most famous clubs; the point is that Elo Impact at least allows you to put two very different achievements on a similar footing. It’s a results-based metric and cares little for style.[1]

Guardiola is perhaps lower than some might expect, but then he only had a small impact on Bayern Munich’s Elo score during his tenure. A few successful seasons at City and he’ll probably be near the top of this table. Why is Wenger’s average impact so low? As this plot shows, he substantially improved Arsenal during the first half of his tenure, but has essentially flat-lined since the ‘invincibles’ season. Further down the table, Bilic's score has fallen substantially this season as West Ham have had a disappointing campaign so far. 

So what now?

I intend to develop Elo Impact scores for two purposes. First, I’ll track each manager’s scores over the EPL season to track who has had overseen the greatest improvement in their side. I’m happy to provide manager rankings for other leagues or individual clubs on request.  Second, as new managers arrive, I’ll look at their Elo track record to gain an insight on whether they’re likely to be a be success or not. 

It's going to be fascinating to see which manager comes out on top this season.


Thanks to David Shaw for comments.

[1] Although you do gain/lose more points for big victories/losses.
[2] It is difficult to improve, or even just maintain, a team's Elo score once it rises above 2000. Few points are gained for winnings games and many are lost for losing them. Basically, the team is already at (or near) the pinacle of European football. For this reason I've made a slight correction to the Elo Impact measure: when a club's Elo score is greater than 2000 points, I've set the maximum decrease in a manager's Elo Impact to 10 points per game. Once the club's score drops below 2000, the normal rules apply.

Tuesday, 1 November 2016

Wenger's Winter Curse

Halloween may have passed but Arsenal's fans will remain fearful throughout November. This is the month where, historically, Wenger's team have tended to perform significantly below par. Since Wenger took charge in 1997, Arsenal have collected an average of 1.6 points per game in November, compared to a season average of 2 points per game.

In fact, as the figure below demonstrates, Arsenal don't really recover until mid-December. The thin blue line shows the average number of points that Wenger's Arsenal collect in each gameweek of the season; the dashed blue line shows a 3-game moving average. The Nov/Dec curse is clearly visible[1].

For comparison, I've also plotted the same results for Man United under Ferguson. For both teams, I used data from the seasons 97/98-12/13, the period in which the two managers overlap.

Average number of points collected by Arsenal (blue) and Man United (red) over the seasons 97/98-12/13. Solid lines show the average for each game week, dashed lines show a 3-match moving average.

It's interesting to compare the seasonal performance of the two managers. In the first and final thirds of the season, Wenger's points-per-game closely matches Ferguson's. However, while Ferguson's teams would step up their performance in December (perhaps after the group stage of the Champions League finished), Wenger's seem to struggle in early winter before improving in February.

I have no idea what causes Arsenal's end-of-year blips: injuries, Champions League involvement, fear of the English winter, or excessive bad luck? Whatever it is, we'll all be watching with interest to see if they can overcome it this year.

[1] And significant, in the statistical sense.

Friday, 28 October 2016

Want to know where your team's gonna finish?

We’re nearly a quarter of the way through the EPL season and the league already has a familiar feel to it. Manchester City are top, Arsenal are above Spurs, and Sunderland anchor the table having failed to win a single game so far. There is clearly a lot of football still to be played, but does the table already resemble how it’ll look come the end of May?

Conventional wisdom tells us that the turn of the year is a crucial period. By the beginning of January we are supposed to have a good idea of how things are shaping up. In 9 of the last 20 EPL seasons, the team that was top at January went on to win the league. 56% of teams in the bottom three on new year’s day will be relegated. However, you get pretty much the same results if you measure these stats at the beginning of December or the beginning of February, so perhaps we don’t learn that much over the Christmas period after all.

In this post I’m going to look back over the last 20 seasons to investigate how the league table actually evolves over a season and, in particular, when in the season we start to have a reasonable picture of where teams might finish.

Rank Correlations

A good starting point is to measure the correlation between the final league positions and those at some earlier point in the season. Essentially you’re measuring the degree to which the orderings of the teams are the same. If the team rankings were identical, we’d measure a correlation of 1; if they were completely different we’d expect the correlation to be close to zero.

Figure 1 shows the correlations between the league rankings after each game week and the rankings at the end of the season, for the last 20 EPL seasons. The grey lines show the correlations for the individual seasons; the red line shows the average.

Figure 1: The correlation between the league rankings after each gameweek and the final rankings at the end of the season. Grey lines show results for each of the last 20 EPL seasons, the red line shows the average correlation for each gameweek.

The most striking thing about this plot is that the correlation rises so quickly at the beginning of the season. You get to an average correlation of 0.8  - which is very high[1] - by the 12th round of games. There’s some variation from season-to-season, of course, but the general picture is always the same: we learn rapidly in the first 12 or so games, and then at a slower, even pace over the rest of the season.

This implies is that we know quite a lot about how the final league rankings will look after just a third of the season. But there’s no mantra that states ‘top in Halloween, champions in May’, so why is the correlation so high so soon, and what does it actually mean?

Leagues in leagues

I think that the explanation is provided by what is sometimes referred to as the ‘mini-leagues’. The idea is that the EPL can be broken down into three sub-leagues: those teams competing to finish in the top four (the champions league places), those struggling at the bottom, and those left in the middle fighting for neither the riches of the champions league nor for their survival. 

Figure 2 demonstrates that these mini-leagues are already established early in the season. It shows the probability of each team finishing in the top 4 (red line) or bottom 3 (blue lines), based on their ranking after their 12th game. The results were calculated from the last 20 EPL seasons.

Figure 2: The probability of finishing in the top four (red line) or bottom three (blue line) based on league position after 12 games. The red, white and blue shaded regions indicate the three ‘mini-leagues’ within the EPL.

The red-shaded region shows the ‘top’ mini-league: the teams with a high chance of finishing in the champions league places. Teams below 7th place are unlikely to break into this elite group. Similarly, teams placed 14th or above are probably not going to be relegated; therefore, those between 7th and 14th position are in the middle ‘mini-league’. Teams in the last third seem doomed to be fighting relegation at the end of the season: they make up the final mini-league.

The high correlation we observed after twelve games in Figure 1 is consequence of the mini-leagues. It’s entirely what you’d expect to measure from a table that is already clustered into three groups – top, middle and bottom – but where the final ordering within each group is still to be determined.

I’m not suggesting that membership of the mini-leagues is set in stone – there’s clearly promotion and relegation between them throughout the season (and yo-yo teams) – but by November there is a hierarchy in place. Even at this relatively early stage of the season, most teams will have a reasonable idea of which of third of the table they are likely to end up in.

Finally, awareness of this may also explain the recent increase in the number of managers getting sacked early in the season. Last year three EPL managers lost their jobs before the end of October and we've already lost one this season. If Mourinho doesn't find himself in the top eight after the next few games, the pressure may ramp up several notches higher.


Thanks to David Shaw for comments.

[1] And certainly significant.

Thursday, 13 October 2016

Forecasting Football: a hedgehog amongst the foxes.

“I never make predictions and I never will.” – Paul Gascoigne

For football fans, making predictions is half the fun. It’s also big business: the global sports betting market is estimated to be worth roughly half a trillion pounds, and 70% of the trade is thought to come from football.

We all make predictions, but some of us are better than others. In his book, The Signal and the Noise, Nate Silver describes two categories of predictor: hedgehogs and foxes[1]. Hedgehogs tend to have strong pre-conceived notions of how the future will pan out, a view of the natural order of things that they can be reluctant to change. They are confident and assertive. Foxes, on the other hand, are not anchored to a particular world-view: they are more willing to change their position as new information arrives. Foxes try to weigh up all the available evidence; often their predictions are probabilistic in nature.

In football, TV pundits are hedgehogs: ex-players and managers chosen to provide us with their insider perspective. But who are the Foxes? I think the closest example is the bookmakers, many of whom now rely on statistical models to forecast the outcome of matches and set their odds.

How do pundits and bookmakers measure up? In this post I compare the forecasts of a well-known football pundit with some of the UK’s largest bookmakers. We'll see who comes out on top.

Lawrenson vs the Bookmakers

It’s difficult to collect up a large sample of predictions for pundits. Fortunately, Mark Lawrenson comes to the rescue. For those that are not familiar with him, Lawrenson is a former Liverpool player who now makes regular TV appearances as a pundit for the BBC. For the last six seasons he has also provided them with his predictions for the outcome of almost every EPL game.[2]

So how do we compare Lawrenson’s intuition with bookmakers betting odds? I think the simplest answer is also the most fun: run an experiment in which we imagine that I had bet on every prediction he made from the 2011/12 season through the 2015/16 season and see how much money I would have won or lost along the way.

Here’s how the experiment works: for each game I collected the odds offered by four bookmakers on a home win, away win and draw[3]. I identified the bookmaker that offered the best odds for Lawrenson’s predicted outcome and bet £1 on that outcome. I then tracked the cumulative profit/loss over the 1856 games that he attempted to predict (he misses a few each season). The results are shown in Figure 1.

Figure 1: The cumulative profit/loss generated from betting £1 on each of Mark Lawrenson’s match predictions over the last 5 EPL seasons (nearly 2000 matches).

After a shaky first season, Lawrenson’s predictions do pretty well. If you had bet £1 on each of his predictions since the beginning of the 11/12 season you would have made a £105 profit by the end of last season; this equates to a return of 6% per match. On a season-by-season basis, you’d make a profit in all but the first season, with 12/13 being the most profitable year. He had particularly good stretches at the beginning of the 12/13 and 14/15 seasons, with runs of 11 and 14 correct outcomes, respectively.

Lawrenson correctly predicted the outcome in 938 of the 1856 matches, a success rate of 51%. To put this in context, 44% of the matches ended in a home win, 31% in an away win and 25% in a draw. If you just bet on a home win in every game you’d be down £51 by the end of the five seasons.

How significant is this? What is the probability that you could be at least £105 in profit after 5 years by pure luck alone? This is an important question that requires a technical answer.

I reran the full 5-year experiment 10,000 times, randomly assigning a prediction of home win, away win or draw for every game. The rate of home wins, away wins and draws in each season were fixed to be the same as in Lawrenson’s forecasts; I basically just shuffled his forecasts around between matches. The result: in only 156 of the 10,000 simulations did the profits exceed £105, indicating that it would be unlikely to make this level of profit by chance. So he demonstrated real skill in his forecasts.

Has Lawrenson beat the casino? I think the answer is yes. That does not necessarily mean that he is a superior predictor though. If you were to perform the experiment again, this time selecting the outcome with the shortest betting odds as your prediction, you’d be right in 53% in matches – a higher success rate than Lawrenson. However, if you had placed bets on those outcomes you would have lost £47[4]. So Lawrenson’s predictions make their money on the less favoured outcomes – those with longer odds. Indeed, he only picks the outcome with the shortest odds about two-thirds of the time.

Finally, I’m keeping track of Lawrenson’s hypothetical P&L for the 16/17 season. You can find it here. If I can get hold of the data, I’ll add other pundits to this, too.

Thanks to David Shaw and Brent Strickland for comments.

[1] This originates back to the essay The Hedgehog and the Fox by the philosopher Isiah Berlin.
[2] These guys look like they have collected data for other pundits, too. I’ll see if I can get hold of it and add it to this analysis
[3] The four bookmakers are Ladbrokes, William Hill, Bet365 and Bet Victor.
[4] Presumably because bookmakers shorten the odds they offer to generate a profit.

Wednesday, 5 October 2016

The managerial merry-go-round spins ever faster

We’re less than a quarter of the way into the season and the great managerial merry-go-round has already shed its first passengers. Swansea City's former manager Francesco Guidolin was the EPL’s first casualty and five EFL managers have been relieved of their duties. Sam Allardyce, the now former England manager, also terminated his contract last week following a Daily Telegraph investigation into his conduct.

Guidolin was Swansea’s manager for only 259 days. Roberto Di Matteo was removed as Aston Villa’s manager after 121 days. The longest serving of the recently departed was Tony Mowbray, who lasted under two years at Coventry City. Over the course of last season 58 managers were fired; the season before that it was 47.

It certainly feels like managerial tenures are getting shorter and shorter, but is this part of a long-term trend or a recent phenomenon of the money-spinning era? And does it really make much sense to frequently change manager?

Diminishing patience, at all levels

To answer the first question, I put together a dataset containing every manager of a professional English club (i.e. top four divisions) since 1950, and measured the total number of league matches that each of them managed1. I then aggregated the data into 10-year blocks and looked at the distribution of the duration of managerial tenures (measured in number of matches) within each block. The results are shown in Figure 1.

You read the plot in the following way: each line in the plot represents a certain percentile of managers leaving their jobs, from 10% (bottom line) to 90% (top line). For example, the solid black line in the middle indicates the number of matches by which half of managers had left their club. The top-most line indicates the number of matches by which 90% of managers had left.

Figure 1: The diminishing survival rate of managers in profession English football since the 1950s. Each line represents the number of matches (or seasons, right axis) by which a given percentage of managers, from 10% (bottom line) to 90% (top line), had left their post.

At the beginning of the millennium, 50% of managers would leave their job by the 80th league match since their appointment (roughly two seasons), and 90% had left after 200 matches (5 seasons). Or, to put it another way, only 10% of managers survived to see their 200th game.  Go back to 1970, and you see that at least 50% of managers were around long enough to oversea their 130th match in charge of a single club (3 seasons) and more than 10% of managers survived long enough to see their 300th match.

The duration of managerial appointments has steadily declined over time, to the extent that it has basically halved over the last forty years.  These days more than 50% of managers will not see out two seasons; 25% will barely last a single season. Interestingly, there is no evidence that the rate of managerial turnover has increased in the last twenty years. Given that the rewards of success and the costs of failure have been greatly magnified in recent times, I had expected to see that club owners have become increasingly less patient.

I don’t think that there is much evidence that changing manager is likely to lead to any improvement in a club’s fortunes – essentially you’re just rolling the dice again and hoping the next guy does better. In principle there is no problem with that, but in practice there can be big cost to starting over again.

The costs of changing manager

Football managers have an enormous amount of power at their clubs. Not only do they decide team selection, they oversee training, tactics, scouting, and handpick their own coaching staff. Crucially, they also decide transfer targets.

Clubs spend a vast amount of money on player recruitment. Last summer, EPL clubs spent a total of £1.3 billion on transfer fees and roughly the same the year before. They also spent roughly £130m on agent’s fees. Furthermore, when a new manager arrives at a club he is often promised a sizeable chunk of cash to bring in the players he wants. In the last five years EPL clubs have tended to spend considerably more when they have just hired a new manger.

But here’s the rub: new players are typically brought in on 4 or 5-year contracts. If only 25% of managers survive to the end of their third year, the players the manager brought in will invariable outlast him at the club. A new manager is then hired who will identify and recruit the players that suit his preferred way of playing, most of whom will then outlast him.

Taking this to its logical conclusion implies that clubs that frequently change manager may end up with an incoherent set of players, some of whom may be surplus to requirements under the next manager. Given the increasing cost of agent’s fees – not to mention the costs of buying a manager out of his contract2 – this seems like an inefficient method of running an organization. If the majority of managers leave by the end of their second season, maybe clubs should be more wary about allowing them to buy and sell as they please.

The key seems to be continuity, not in manager retention but in manager recruitment. Establish a style of play and then consistently bring in managers that will largely adhere to it. Although, as Swansea City are finding out, this is perhaps easier said than done.


While writing this I discovered two other blogs that have discussed the long-term decline of manager tenures (here and here).

[1] I removed caretaker managers though, which I define as any manager that oversaw less than 10 games.
[2] For example, Man Utd paid David Moyes £4.5m to leave, and Louis Van Gaal £8m.

Monday, 26 September 2016

English Hares and Italian Tortoises: When do Goal-scorers peak?

Yesterday, I walked into a New York bar just in time to see Francesco Totti wheel away in celebration. He had just sent Torino's goalkeeper Joe Hart the wrong way from the penalty spot to score his 250th Serie A goal. Totti has scored more Serie A goals than any other player in the last sixty years. He has also now scored in 23 consecutive Serie A seasons.  He turns 40 on Wednesday.

An interesting feature of Totti’s career is that he scored nearly half of his goals since he turned thirty. He didn’t even really get going until his late twenties, a slow burner. Contrast this with Wayne Rooney: 173 EPL goals so far, half of which were achieved by the age of 24. A fixture in the Man Utd team since the age of 18, he is now, at 30, perceived to be much a diminished force.

So when do strikers normally reach their goal scoring peak, and how rapidly do they decline thereafter? Do the hares that establish themselves early in their career tend to burn out faster than the tortoises that have the more sedate start? To investigate, we need to look at the goal-scorer's aging curve.

The goal-scorer's aging curve

The aging curve of footballers – how their ability and effectiveness changes with age – is notoriously difficult to measure. How do you know when a defender has reached the peak of his career? Defending is very much a team responsibility, and great defenders can often play in attack-minded teams that concede quite a few goals. Football is team sport and it is difficult to isolate the contribution of individual players.

However, the main job of a striker is to score goals, which is a convenient a barometer by which to judge the ebb and flow of a career. By looking at how their average goals per game ratio varies over their careers, we can investigate the age at which they hit peak goal scoring ability. Obviously there are lots of factors in play: the number of chances created by teammates, the standard of the opposition, changes in position. But if we take a large enough sample of players some of this should average out, enabling us to measure the effects of aging.

I collected the career statistics of the 50 highest goal scorers in England, Italy and Spain since the 1992/93 season. This gave me a sample of 146 players (there’s a bit of overlap between the countries). In every year of their career, I measured the goals-per-game ratio for each player[1]. After a bit of smoothing and rescaling[2], I calculated the average profile across the 146 strikers. The result is shown in Figure 1.

The plot shows how scoring rate depends on player age for the most proficient goal scorers to have played in England, Spain and Italy in the last 25 years. The curve has been scaled so that the peak – which is at age 26 – is equal to one. The shaded region is indicative of the level sampling uncertainty.

Figure 1: The goal-scorer's aging profile: the rate at which the goals/game ratio varies over an elite striker’s career. The curve is scaled such that the peak is equal to 1. 

Footballers are generally perceived to peak in their mid-twenties, between the ages of around 24 and 29; the data in Figure 1 supports that notion. However, what surprises me is that the slope of the curve is not steeper as we move away from the peak in either direction. The profile implies that at the ages of 18 and 32 strikers are scoring at 80% of their peak rate, which seems quite high.

Part of the explanation for this is related to sample selection: I’m looking at the top goal scorers over the last 25 years and so their careers at the top level of football were naturally quite long (basically a form of survivorship bias). However, there is another, more interesting reason. Figure 2 shows the ageing profiles for the EPL and Serie A players separately. There is a striking difference (hah!).

Figure 2: The aging profiles for top strikers in the EPL and Serie A. Both curves are scaled such that their peaks are equal to 1. The is a clear preference for youth in the EPL and experience in Serie A. The grey region indicates sample uncertainty.

Players that play in the EPL score a substantial portion of their goals in their early twenties. There is a slow increase in the strike rate up to the peak at age 26 and then a more rapid drop-off. Serie A players are the opposite: they score predominantly in the latter half of their career. While their peak is only a year or two later than EPL players, there is a more gradual decline thereafter. The La Liga players – in case you’re wondering – are between the two.

I interpret this as resulting from the difference in the style of play between the two leagues. The frenetic pace of the EPL lends itself to the energy and potency of youth, while the slower, more cerebral style of Serie A requires the game intelligence which is, generally, the product of experience. 

This hypothesis is supported by player appearances. By the age of 22 the top strikers in the EPL have played 22% more league games than their Serie A counterparts. However, over the age of 30, strikers in Serie A play 18% more games. Young players get more opportunities in the EPL, older players are preferred in Serie A. An age difference is also evident in the national teams: at Euro 2016, the average age of the England squad was 25.4; the average age of the Italian team was 28.4. 

The relentlessness of the EPL may also expedite a player’s decline, while the more forgiving tempo of Serie A slows it. Hence strikers play on for longer in Italy. Totti was the youngest player ever to captain a Serie A team; he is also the oldest player to score in the UEFA Champions League.

In his autobiography, England manager Sam Allardyce suggested that players are at their best for around a decade. If we accept this as true, then perhaps it is no surprise that Rooney’s days as a leading goal scorer appear to be over. Of course, Ryan Giggs was able to remodel his playing style enabling him to play for Man Utd into his forties. Will Rooney also learn how to become a tortoise?

[1] If a player made less than 15 appearances in a given season, I ignored their ratio for that season. If a player played in multiple leagues in his career, I also only consider games played in the top divisions of England, France, Spain, Germany and Italy.
[2] Each profile is rescaled so that the goals/per game ratio is equal to 1 in the peak year. This is because I wanted all players to be weighted equally when taking the average profile over the sample. Before rescaling, I lightly smoothed the profiles with a moving average. 

Friday, 16 September 2016

What's the problem with English players?

The English national team seems to have hit an all-time low. After being humiliated by the mighty Iceland in Euro 2016, England fans must look back wistfully on the days when pundits would forecast that they would be “knocked-out by the first decent team that they meet”; it seems that these days England struggle to progress far enough to meet a decent team. You have to go back to 2002 to find the last time England beat one of the world’s top ten teams in a major international tournament.

A lot has been written about the impact of the EPL, in particular the proliferation of foreign players and the resulting difficulties faced by talented young English players to get playing time. On average, less than a third of the players starting games are English; 25 years ago, in the first year of the EPL, it was more than two thirds. However, while England certainly has the lowest fraction of home-grown players in Europe’s top-5 leagues, Italy and Germany (both around a half) are not much further ahead. Two thirds of the players starting games at the highest level in France and Spain are home-grown.

Another English Trade Deficit

If the competition is so tough at home, why don’t more English players go abroad? Surely they would benefit from the experience: a different style of football, exposure to new training methods and coaching techniques, tactical variations, the fans, weather, scheduling; perhaps even different refereeing standards.

Figure 1 shows the number of English, French, German, Italian and Spanish players that have moved abroad to one of the other big five European leagues in the last fifteen years[1].  For example, since 2000, five English players have moved to Spanish clubs in La Liga; over the same period, 48 Italian, 85 French and 15 German players have transferred to the top level of Spanish football.

Figure 1: The number of English, French, German, Spanish and Italian players that have moved abroad to play in another top-5 European league since the 2000/01 season.

When measured in players, England has a huge trade deficit with the other four countries (and in fact, with all other European countries). More than twice as many Italians have played in the EPL than English players in all the other top leagues put together. Spain, Italy and France have all exported more than 100 of their players since 2000, England has exported just 16 – and three of those are David Beckham[2].

Germany has the second fewest players to have moved abroad to one of the other big European leagues: 55 since 2000. But look closely, and you find that half of them played for the German national team while they were abroad; over the same period only four of the 16 English players played international football while playing abroad. It is, therefore, not unreasonable to infer that (at least when it comes to football) German exports are superior to English ones.  

It wasn’t always this way. The table below compares the number of players moving to the top leagues abroad in the periods 1960-1980, 1980-2000 and 2000-present. In the second half of the last century, there were generally fewer players playing abroad, but they were much more evenly distributed between nationalities. For example, more English players played in the top leagues abroad than Spaniards. However, since 2000, the number of Italian, Spanish and French footballers moving abroad has exploded; meanwhile, the number of English players has more than halved.

Table 1: Total number of players moving abroad to play in a top-5 European league since 1960.    

So is there no longer any demand for English players? Are they too expensive? Or, are they just reluctant to move abroad? Certainly money must be a big factor - the average salary in the EPL is nearly double that of La Liga (e.g. see here and here), and so many English players would have to take a pay cut to move abroad. I don’t have the data, but I expect that the disparity increases for young players

Where have all the Gaffers gone?

The lack of foreign playing experience may also have repercussions at the coaching and managerial level. This season, only five EPL teams have English managers; last season it was eight. Four EPL teams are currently managed by Italians, two by Spaniards. Holland, Portugal, Argentina, France and Germany each have one representative. 

In stark contrast, more than half of Bundesliga managers are German. Three quarters of Serie A teams are managed by Italians, and the same is true in Spain and France. Even in the Championship less than half the managers are English. There are precisely zero English managers working abroad in Europe’s major leagues. 

Perhaps this is a self-perpetuating cycle. Few English players move abroad, and so miss out on the breadth of experience that might make them better managers and coaches in the future. This in turn has a detrimental effect on the development of young English players, who are then less likely to be sought after by major foreign teams. And so it goes on. 

Gary Neville may have failed during his brief tenure at Valencia, but he gained experience that few other young English coaches have (and his brother is still there). I hope others follow his lead.

[1] Players that transferred to a foreign club pre-2000 are not included.
[2] Beckham is included three times as he played for Real Madrid in Spain, PSG in France and AC Milan in Italy.