Friday, 28 October 2016

Want to know where your team's gonna finish?

We’re nearly a quarter of the way through the EPL season and the league already has a familiar feel to it. Manchester City are top, Arsenal are above Spurs, and Sunderland anchor the table having failed to win a single game so far. There is clearly a lot of football still to be played, but does the table already resemble how it’ll look come the end of May?

Conventional wisdom tells us that the turn of the year is a crucial period. By the beginning of January we are supposed to have a good idea of how things are shaping up. In 9 of the last 20 EPL seasons, the team that was top at January went on to win the league. 56% of teams in the bottom three on new year’s day will be relegated. However, you get pretty much the same results if you measure these stats at the beginning of December or the beginning of February, so perhaps we don’t learn that much over the Christmas period after all.

In this post I’m going to look back over the last 20 seasons to investigate how the league table actually evolves over a season and, in particular, when in the season we start to have a reasonable picture of where teams might finish.

Rank Correlations


A good starting point is to measure the correlation between the final league positions and those at some earlier point in the season. Essentially you’re measuring the degree to which the orderings of the teams are the same. If the team rankings were identical, we’d measure a correlation of 1; if they were completely different we’d expect the correlation to be close to zero.

Figure 1 shows the correlations between the league rankings after each game week and the rankings at the end of the season, for the last 20 EPL seasons. The grey lines show the correlations for the individual seasons; the red line shows the average.

Figure 1: The correlation between the league rankings after each gameweek and the final rankings at the end of the season. Grey lines show results for each of the last 20 EPL seasons, the red line shows the average correlation for each gameweek.

The most striking thing about this plot is that the correlation rises so quickly at the beginning of the season. You get to an average correlation of 0.8  - which is very high[1] - by the 12th round of games. There’s some variation from season-to-season, of course, but the general picture is always the same: we learn rapidly in the first 12 or so games, and then at a slower, even pace over the rest of the season.

This implies is that we know quite a lot about how the final league rankings will look after just a third of the season. But there’s no mantra that states ‘top in Halloween, champions in May’, so why is the correlation so high so soon, and what does it actually mean?

Leagues in leagues


I think that the explanation is provided by what is sometimes referred to as the ‘mini-leagues’. The idea is that the EPL can be broken down into three sub-leagues: those teams competing to finish in the top four (the champions league places), those struggling at the bottom, and those left in the middle fighting for neither the riches of the champions league nor for their survival. 

Figure 2 demonstrates that these mini-leagues are already established early in the season. It shows the probability of each team finishing in the top 4 (red line) or bottom 3 (blue lines), based on their ranking after their 12th game. The results were calculated from the last 20 EPL seasons.

Figure 2: The probability of finishing in the top four (red line) or bottom three (blue line) based on league position after 12 games. The red, white and blue shaded regions indicate the three ‘mini-leagues’ within the EPL.

The red-shaded region shows the ‘top’ mini-league: the teams with a high chance of finishing in the champions league places. Teams below 7th place are unlikely to break into this elite group. Similarly, teams placed 14th or above are probably not going to be relegated; therefore, those between 7th and 14th position are in the middle ‘mini-league’. Teams in the last third seem doomed to be fighting relegation at the end of the season: they make up the final mini-league.

The high correlation we observed after twelve games in Figure 1 is consequence of the mini-leagues. It’s entirely what you’d expect to measure from a table that is already clustered into three groups – top, middle and bottom – but where the final ordering within each group is still to be determined.

I’m not suggesting that membership of the mini-leagues is set in stone – there’s clearly promotion and relegation between them throughout the season (and yo-yo teams) – but by November there is a hierarchy in place. Even at this relatively early stage of the season, most teams will have a reasonable idea of which of third of the table they are likely to end up in.

Finally, awareness of this may also explain the recent increase in the number of managers getting sacked early in the season. Last year three EPL managers lost their jobs before the end of October and we've already lost one this season. If Mourinho doesn't find himself in the top eight after the next few games, the pressure may ramp up several notches higher.

--------------------------

Thanks to David Shaw for comments.

[1] And certainly significant.

Thursday, 13 October 2016

Forecasting Football: a hedgehog amongst the foxes.

“I never make predictions and I never will.” – Paul Gascoigne

For football fans, making predictions is half the fun. It’s also big business: the global sports betting market is estimated to be worth roughly half a trillion pounds, and 70% of the trade is thought to come from football.

We all make predictions, but some of us are better than others. In his book, The Signal and the Noise, Nate Silver describes two categories of predictor: hedgehogs and foxes[1]. Hedgehogs tend to have strong pre-conceived notions of how the future will pan out, a view of the natural order of things that they can be reluctant to change. They are confident and assertive. Foxes, on the other hand, are not anchored to a particular world-view: they are more willing to change their position as new information arrives. Foxes try to weigh up all the available evidence; often their predictions are probabilistic in nature.

In football, TV pundits are hedgehogs: ex-players and managers chosen to provide us with their insider perspective. But who are the Foxes? I think the closest example is the bookmakers, many of whom now rely on statistical models to forecast the outcome of matches and set their odds.

How do pundits and bookmakers measure up? In this post I compare the forecasts of a well-known football pundit with some of the UK’s largest bookmakers. We'll see who comes out on top.

Lawrenson vs the Bookmakers


It’s difficult to collect up a large sample of predictions for pundits. Fortunately, Mark Lawrenson comes to the rescue. For those that are not familiar with him, Lawrenson is a former Liverpool player who now makes regular TV appearances as a pundit for the BBC. For the last six seasons he has also provided them with his predictions for the outcome of almost every EPL game.[2]

So how do we compare Lawrenson’s intuition with bookmakers betting odds? I think the simplest answer is also the most fun: run an experiment in which we imagine that I had bet on every prediction he made from the 2011/12 season through the 2015/16 season and see how much money I would have won or lost along the way.

Here’s how the experiment works: for each game I collected the odds offered by four bookmakers on a home win, away win and draw[3]. I identified the bookmaker that offered the best odds for Lawrenson’s predicted outcome and bet £1 on that outcome. I then tracked the cumulative profit/loss over the 1856 games that he attempted to predict (he misses a few each season). The results are shown in Figure 1.

Figure 1: The cumulative profit/loss generated from betting £1 on each of Mark Lawrenson’s match predictions over the last 5 EPL seasons (nearly 2000 matches).

After a shaky first season, Lawrenson’s predictions do pretty well. If you had bet £1 on each of his predictions since the beginning of the 11/12 season you would have made a £105 profit by the end of last season; this equates to a return of 6% per match. On a season-by-season basis, you’d make a profit in all but the first season, with 12/13 being the most profitable year. He had particularly good stretches at the beginning of the 12/13 and 14/15 seasons, with runs of 11 and 14 correct outcomes, respectively.

Lawrenson correctly predicted the outcome in 938 of the 1856 matches, a success rate of 51%. To put this in context, 44% of the matches ended in a home win, 31% in an away win and 25% in a draw. If you just bet on a home win in every game you’d be down £51 by the end of the five seasons.

How significant is this? What is the probability that you could be at least £105 in profit after 5 years by pure luck alone? This is an important question that requires a technical answer.

I reran the full 5-year experiment 10,000 times, randomly assigning a prediction of home win, away win or draw for every game. The rate of home wins, away wins and draws in each season were fixed to be the same as in Lawrenson’s forecasts; I basically just shuffled his forecasts around between matches. The result: in only 156 of the 10,000 simulations did the profits exceed £105, indicating that it would be unlikely to make this level of profit by chance. So he demonstrated real skill in his forecasts.

Has Lawrenson beat the casino? I think the answer is yes. That does not necessarily mean that he is a superior predictor though. If you were to perform the experiment again, this time selecting the outcome with the shortest betting odds as your prediction, you’d be right in 53% in matches – a higher success rate than Lawrenson. However, if you had placed bets on those outcomes you would have lost £47[4]. So Lawrenson’s predictions make their money on the less favoured outcomes – those with longer odds. Indeed, he only picks the outcome with the shortest odds about two-thirds of the time.

Finally, I’m keeping track of Lawrenson’s hypothetical P&L for the 16/17 season. You can find it here. If I can get hold of the data, I’ll add other pundits to this, too.


--------
Thanks to David Shaw and Brent Strickland for comments.

[1] This originates back to the essay The Hedgehog and the Fox by the philosopher Isiah Berlin.
[2] These guys look like they have collected data for other pundits, too. I’ll see if I can get hold of it and add it to this analysis
[3] The four bookmakers are Ladbrokes, William Hill, Bet365 and Bet Victor.
[4] Presumably because bookmakers shorten the odds they offer to generate a profit.


Wednesday, 5 October 2016

The managerial merry-go-round spins ever faster

We’re less than a quarter of the way into the season and the great managerial merry-go-round has already shed its first passengers. Swansea City's former manager Francesco Guidolin was the EPL’s first casualty and five EFL managers have been relieved of their duties. Sam Allardyce, the now former England manager, also terminated his contract last week following a Daily Telegraph investigation into his conduct.

Guidolin was Swansea’s manager for only 259 days. Roberto Di Matteo was removed as Aston Villa’s manager after 121 days. The longest serving of the recently departed was Tony Mowbray, who lasted under two years at Coventry City. Over the course of last season 58 managers were fired; the season before that it was 47.

It certainly feels like managerial tenures are getting shorter and shorter, but is this part of a long-term trend or a recent phenomenon of the money-spinning era? And does it really make much sense to frequently change manager?

Diminishing patience, at all levels


To answer the first question, I put together a dataset containing every manager of a professional English club (i.e. top four divisions) since 1950, and measured the total number of league matches that each of them managed1. I then aggregated the data into 10-year blocks and looked at the distribution of the duration of managerial tenures (measured in number of matches) within each block. The results are shown in Figure 1.

You read the plot in the following way: each line in the plot represents a certain percentile of managers leaving their jobs, from 10% (bottom line) to 90% (top line). For example, the solid black line in the middle indicates the number of matches by which half of managers had left their club. The top-most line indicates the number of matches by which 90% of managers had left.

Figure 1: The diminishing survival rate of managers in profession English football since the 1950s. Each line represents the number of matches (or seasons, right axis) by which a given percentage of managers, from 10% (bottom line) to 90% (top line), had left their post.

At the beginning of the millennium, 50% of managers would leave their job by the 80th league match since their appointment (roughly two seasons), and 90% had left after 200 matches (5 seasons). Or, to put it another way, only 10% of managers survived to see their 200th game.  Go back to 1970, and you see that at least 50% of managers were around long enough to oversea their 130th match in charge of a single club (3 seasons) and more than 10% of managers survived long enough to see their 300th match.

The duration of managerial appointments has steadily declined over time, to the extent that it has basically halved over the last forty years.  These days more than 50% of managers will not see out two seasons; 25% will barely last a single season. Interestingly, there is no evidence that the rate of managerial turnover has increased in the last twenty years. Given that the rewards of success and the costs of failure have been greatly magnified in recent times, I had expected to see that club owners have become increasingly less patient.

I don’t think that there is much evidence that changing manager is likely to lead to any improvement in a club’s fortunes – essentially you’re just rolling the dice again and hoping the next guy does better. In principle there is no problem with that, but in practice there can be big cost to starting over again.

The costs of changing manager


Football managers have an enormous amount of power at their clubs. Not only do they decide team selection, they oversee training, tactics, scouting, and handpick their own coaching staff. Crucially, they also decide transfer targets.

Clubs spend a vast amount of money on player recruitment. Last summer, EPL clubs spent a total of £1.3 billion on transfer fees and roughly the same the year before. They also spent roughly £130m on agent’s fees. Furthermore, when a new manager arrives at a club he is often promised a sizeable chunk of cash to bring in the players he wants. In the last five years EPL clubs have tended to spend considerably more when they have just hired a new manger.

But here’s the rub: new players are typically brought in on 4 or 5-year contracts. If only 25% of managers survive to the end of their third year, the players the manager brought in will invariable outlast him at the club. A new manager is then hired who will identify and recruit the players that suit his preferred way of playing, most of whom will then outlast him.

Taking this to its logical conclusion implies that clubs that frequently change manager may end up with an incoherent set of players, some of whom may be surplus to requirements under the next manager. Given the increasing cost of agent’s fees – not to mention the costs of buying a manager out of his contract2 – this seems like an inefficient method of running an organization. If the majority of managers leave by the end of their second season, maybe clubs should be more wary about allowing them to buy and sell as they please.

The key seems to be continuity, not in manager retention but in manager recruitment. Establish a style of play and then consistently bring in managers that will largely adhere to it. Although, as Swansea City are finding out, this is perhaps easier said than done.


----------

While writing this I discovered two other blogs that have discussed the long-term decline of manager tenures (here and here).

[1] I removed caretaker managers though, which I define as any manager that oversaw less than 10 games.
[2] For example, Man Utd paid David Moyes £4.5m to leave, and Louis Van Gaal £8m.