Wednesday, 7 June 2017

Tinker, Tailor, Mould a Side: Squad Rotation in the EPL

One of the key features of Chelsea’s title winning season was the consistency of their starting lineups. After the 3-0 thrashing by Arsenal in September, Conte switched to a 3-4-3 formation and Chelsea embarked on a 13-match winning streak in the league that ultimately propelled them to the title. The foundation of this formation – Luiz, Cahill and Azpilicueta – started each of the next 32 games, and the wing-backs, Moses and Alonso, missed only three between them.

Such consistency is partly due to luck with injuries and suspensions, but Conte also resisted the temptation to tinker with his team. Other managers opted to ring the changes, for tactical purposes or to rest players. In the closing weeks of the season Mourinho was compelled to defend his rotation policy, citing fixture congestion and the need to maximize his chances of Europa League success. However, frequent changes to United’s starting lineup were a feature of their entire season not just the final few months.

In this article I’m going to take a detailed look at how EPL clubs utilized their squads throughout the season. I’ll compare the rate at which managers ‘rotated’ their teams (which I define simply as the number of changes made to their starting lineup) and the number of players they used in doing so. I’ll investigate some of the factors that may have influenced a manager’s decision to fiddle with his lineup. Finally I’ll discuss whether rotation had an impact on results.

Squad Rotation


Let’s start with a look at squad size and rotation. Figure 1 plots the average number of changes made to the starting lineup against the total number of players used by each EPL club last season.

Clubs on the left-hand side of the plot preferred to maintain the same starting lineup, changing about one player per match. Those plotted towards the right of the plot varied their team more frequently. The vertical axis measures effective squad size – the number of players that started at least one EPL match[1]. Teams that are plotted towards the bottom of the plot picked their lineups from a relatively small group of players, those plotted nearer the top chose them from a larger pool.

Figure 1: Squad rotation (average number of changes made to the starting lineup) versus effective squad size (number of players that started at least one league match) for all EPL clubs in 2016/17. Uses data provided by Stratagem Technologies.

Both quantities plotted in Figure 1 are important. A manager could adopt a highly structured rotation policy in which three players are changed in each match but are chosen from a small squad of only 14 players; this club would appear in the bottom right of the plot. A manager that was struggling to find his best eleven might make the same number of changes per match but from a much larger pool of players; this club would appear near the top right of the plot.

On average, EPL clubs made around two changes per match to their starting lineups, from an effective squad size of twenty-five players. As you might expect, there is clearly a relationship between squad size and rotation: the more frequently that a club rotated, the greater the number of players they tended to use. West Brom, Chelsea, Burnley and Liverpool, who made just over one change per game, fielded the most consistent lineups. Along with Spurs, they also used the fewest numbers of players[2].

At the other end of the scale there is the two Manchester Clubs – both of whom made over three changes per game to their starting lineup – followed by Southampton, Middlesbrough, Swansea and Sunderland. Man Utd and Sunderland, along with West Ham and Hull, all used at least 28 players over the season (admittedly United started 5 players for the first time in their last game of the season).

So there was quite a broad spectrum of squad management styles in the EPL this season, with some clubs rotating more than twice as fast as others and using nearly 50% more players. Why is this? To what degree are team changes enforced or by choice? I’ll now review some of the factors that may have influenced team selection.

Injuries


Injuries and suspensions will have forced managers to make changes to the team. According to physioroom.com, Sunderland suffered the most injuries of all EPL clubs last season, 81 in total[3], with Man United, West Ham and Watford all receiving over 70. Chelsea, West Brom and Burnley were much luckier, suffering about half as many. Liverpool were the black sheep: the only other team to suffer over 70 injuries, but still one of the most consistent starting lineups. I haven’t found data listing the total number of players suspended at each club last season, but Man City, Sunderland, Hull, West Ham and Watford all received at least four red cards, whereas Liverpool, Chelsea, Spurs, Palace and West Brom received none.

In general, there is a weak correlation between both Transfermarkt’s ‘fair play’ score and physioroom’s injury counts and the squad rotation metric used in Figure 1. This suggests that while injuries and suspensions will have contributed to squad rotation, they were not the main driver.

Fixture Volume


Fixture volume over all competitions seems to have influenced some club’s selection decisions. EPL teams played an average of 47 matches this season (which is exactly the number that Chelsea played); Man United played 64 – over a third more – and Man City 56. Generally speaking, teams that played more than 50 matches tended to rotate their league teams more frequently than those that played less, although Sunderland and Middlesbrough both played only 43 matches. European competition is one of the biggest sources of additional matches; in a previous blog I’ve demonstrated that playing in Europe does affect domestic results in the EPL.

A Settled Defence


A key feature of the clubs that rotated the least was a settled defence. Throughout the season Burnley and Chelsea fielded only 6 unique combinations of players in defence, and their preferred defence started more than 25 league matches. In contrast, most EPL teams tried more than 15 different combinations of players in defence, with their most frequent combination typically starting around 12 matches. The teams that rotated the most – those towards right of Figure 1 – never really established a first-choice defence.

The following plots emphasize this point. They show the starting lineups of Burnley and Man City in each of their 38 league matches last season. A dot indicates that a player started the match, with blue dots indicating players that were retained in the starting lineup and red dots showing those that were brought into the starting team. The result of each game is given along the bottom. Similar graphics for all EPL teams can be found here.

The difference between the selections in defense is striking. Both clubs used seven defenders over the course of the season. However, while Burnley’s first-choice back four is obvious, City’s certainly is not. Over the course of the season they tried 21 different combinations of players in defence, which is well over half the total number of unique ways you can select 4 players from 7[4]. City’s most frequent combination in defence was Kolarov, Otamendi, Sagna and Clichy; they started a grand total of 4 matches together.



Indecision


It took some managers several matches at the start of the season to identify the core players around which their team could be constructed. Others took much longer decide on their strongest lineup, and a few never did.

For example, David Moyes never really figured out his best team as the plot below demonstrates. He deployed 36 unique combinations of players during the season (nearly double that of Chelsea), and there was a lack of consistency in defence and midfield, both in terms of personnel and formation. Jose Mourinho also tried a large number of different combinations in every position, particularly in the second half of the season. While United’s rotation frequency certainly increased in the last couple of months of the season, they were already rotating at over 3 players per game before Christmas.



Does rotation matter?


Is there any evidence that rapid squad rotation influences results? This is a tricky question to answer because we don’t know how a team would have performed had they been more or less consistent in their team selection. Periods of high rotation do seem to have coincided with worse results for many teams (West Ham, Watford, Crystal Palace and Swansea, to name a few). However, there is a bit of a chicken-and-egg issue: poor results may compel a manager to change his team until he finds a winning formula.

I find there to be no significant relationship between squad rotation and final league position last season. However I would hazard the suggestion that the majority of teams that prioritized stability and a tight-knit squad – those nearest the bottom left corner of Figure 1 – all had successful seasons by their own standards. Crystal Palace are perhaps the exception, but the rate at which they varied their starting lineup dropped significantly (from two changes per game to one) in the second half of the season once Sam Allardyce took charge.

Similarly, those clubs that rotated frequently from a big squad generally had a disappointing year relative to pre-season expectations: City failed to mount a sustained title challenge, United finished sixth, and Hull, Swansea, West Ham, Middlesbrough and Sunderland were either relegated or flirted with relegation.

Perhaps this is just postdiction, but I think it warrants further investigation. It would be interesting to establish whether the performance of a team tends to decline towards the end of a long season if players are not rested. Are big squads problematic if managers are forced to rotate simply to keep his players happy? Does rotation interrupt momentum?

No Europe and a lack of injuries have helped, but the last two EPL seasons have been won by clubs that identified their best 11 players and stuck with them; tailoring and not tinkering. As clubs recruit over the summer we’ll see whether this is a theme that has started to resonate.

Thanks to David Shaw for useful comments. Lineup graphics for all EPL teams can be found here.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataTips.


[1] I’ve investigated other measures of squad size, in particular counting the number of players that started at least 2 or 3 games; none of the conclusions would change significantly.

[2] Note also that Chelsea used three players for the first time in their second-to-last game, when they had already won the league

[3] Injuries are counted from the first weekend of the season. It’s worth on noting that this count includes injuries to squad players who weren’t necessarily in the starting lineup.

[4] Furthermore, City also experimented with Fernandinho or Navas as right full backs, so arguably their defence came from an even bigger pool, and was even more erratically selected.

Friday, 21 April 2017

Why are EPL clubs preferring to recruit foreign managers?

A few weeks ago I read a very interesting article by Sean Ingle in the Guardian comparing the performance of foreign managers in the EPL with their British & Irish counterparts. Using data compiled at the University of Bangor in Wales, the headline statistic was that foreign managers have averaged 1.66 points per game since 1992/93, compared to just 1.29 for those from the UK & Ireland. As Ingle points out: this amounts to a whopping 14 extra points over the course of the season.

My first thought was that these results might be misleading because of a potential selection bias. If overseas managers have tended to work for the top EPL clubs then of course they would have a higher average points per game, simply because a larger proportion of them managed big clubs than their domestic counterparts. In that case it’s not fair comparison. In the first part of the blog I will look at this in more detail: will the result reported in the Guardian stand up to further scrutiny?

Nevertheless, with only seven of the twenty EPL clubs starting this season with a British or Irish manager, it’s clear that clubs are showing a preference for recruiting from overseas. In the second part of this blog I’ll discuss one of the factors that may be motivating EPL clubs to hire foreign managers.

The rise of the foreign manager.


Figure 1 shows the breakdown of managers by nationality for each EPL season since 1992/93 (ignoring caretaker managers[1]). The red region represents English managers, blue the rest of the UK (Scotland, Wales and Northern Ireland), green the Republic of Ireland, and grey the rest of the world. The results are presented cumulatively: for example, this season 28% of EPL managers (7) have been English and 12% (3) were from Scotland and Wales; the remaining 60% of managers in the EPL this season have been from continental Europe (13), South America (1) or the US (1).

Figure 1: Stacked line chart showing the proportion of EPL managers by nationality in each season since 1992/93. Current season represented up to the 1st March 2017. The proportion of managers that are English managers has fallen from two-thirds to one-third over the past 24 years.
The figure shows a clear trend: the number of English managers has significantly declined over the last 24 years. Back in 1992, over two-thirds of managers in the EPL were English and 93% were from the UK as a whole. Since then, the proportion of English managers has more than halved, replaced by managers from continental Europe and, more recently, South America[2]

Is the trend towards foreign managers driven by supremacy over their domestic rivals? 


The table below compares some basic statistics for UK & Irish managers with those of managers from elsewhere. Excluding caretaker managers, there have been 283 managerial appointments in the EPL era, of which over three-quarters have been from the Home Nations or the Republic of Ireland. Of the 66 foreign EPL appointments, nearly half were at one of the following Big6 clubs: Man United, Arsenal, Chelsea, Liverpool, Spurs and Man City[3]. However, only 12% of British or Irish managerial appointments have been at one of these clubs. This is the selection bias I mentioned at the beginning – the top clubs are far more heavily weighted in one sample than the other.


At first glance, foreign managers have performed better: collecting 1.66 points/game compared to 1.29 for their UK & Irish counterparts (reproducing the results published in the Guardian article). However, this difference is entirely driven by the Big6. If you look at performance excluding these clubs it’s a dead heat – foreign managers have performed no better than domestic ones, both averaging 1.2 points per game.

At the Big6 clubs, foreign managers have collected 0.2 points/game more than their UK counterparts. This difference is almost entirely driven by Chelsea and Man City, where foreign managers have collected 0.8 and 0.7 points per game more than UK & Irish managers[4].  But since Abramovich enriched Chelsea in 2003, they have not hired a single British or Irish manager[5]. A similar story at Man City: in only one and a half of the nine seasons since the oil money started to flow into Manchester have they had a British manager (Mark Hughes). Both clubs had very different horizons before and after their respective cash injections, and they have hired exclusively from abroad since[6].

So it seems that, when you look closely, you find little convincing evidence that foreign managers have performed better than domestic managers in the EPL era. Why then do clubs prefer to look beyond these shores?

Access to foreign markets


Previous success is clearly a key criteria in manager recruitment, but I wonder if there are specific attributes that give foreign managers an edge over English candidates. In particular, foreign managers have local knowledge and contacts that might give a club the edge over domestic rivals in signing overseas talent. You could argue that Wenger’s initial success at Arsenal was influenced by his ability to identify and sign top French players at a time when France was dominating international football. Raphael Benitez certainly mined his knowledge of Spanish football to successfully bring a number of players to Liverpool.

In hiring foreign managers, do clubs improve their access to transfer markets overseas? As the table above shows, foreign managers sign players from abroad at roughly twice the rate of domestic managers -- an average of 5 per season compared to the 2.6 per season signed by their British or Irish counterparts. The result does not change significantly if you exclude the Big6 clubs, or if you only look at players signed in the last 15 years. 

This doesn’t prove the hypothesis that clubs sign foreign managers to improve access to foreign players, but it does support it. Of course, being British isn’t necessarily a barrier to signing top overseas talent; after all, Dennis Bergkamp, arguably Arsenal’s greatest ever import, was bought by Bruce Rioch. But in era in which English players come at a premium, it makes sense to for clubs to hire managers that will enable them to lure high quality players from the continent. 

------------------------

Thanks to David Shaw and Tom Orford for comments.

[1] I define a caretaker manager as one that remained in post for less than 60 days.
[2] The proportion of managers from Scotland, Wales and Northern Ireland has generally remained stable at about 25% (although very recently it is has fallen).
[3] The first five are the five best finishers in the EPL era, on average. I decided Man City warranted inclusion because of their two EPL titles.
[4] Of the others, Wenger and Ferguson largely cancel each other out and foreign managers have performed only marginally better at Spurs and Liverpool.
[5] Indeed you have to go all the way back to Glen Hoddle’s departure in 1996 to find Chelsea’s last British or Irish manager.
[6] Mark Hughes was appointed before the Abu Dhabi group bought Man City. 


Wednesday, 29 March 2017

Keep calm and let them carry on: are mid-season sackings worth it?

It’s February and your club is in trouble. Following a run of poor results, they are hovering just above the bottom three. Fans and pundits alike are writing them off. The remainder of the season is destined to be a grim struggle for the points: a few snatched draws, the odd scrappy win, but mostly meek surrender to mid-table and above teams.

The board panics and fires the manger, it seemed the only remaining option. Granted, he did well last season – bought in some good players and promoted others, got them playing attractive football. But now the team needs defibrillation: a new manager with fresh ideas, inspiring players keen to prove themselves to him. A five game honeymoon period and, come spring, everything will be rosy again. After all, it worked so well for Sunderland last season.

This story seems to play out several times each season, but does it actually make any sense to fire a manager mid-season? A few years ago, Dutch economist Dr Bas ter Weel compared points-per-game won immediately before and after a manager has been fired in the Eredivisie. He demonstrated that, while there does tend to be an uptick in results in the following six or so games, it has nothing to do with the change in manager -- it's just mean reversion. Analogous to having rolled 6 ones in a row, the results were very likely to improve in the next 6 matches (or rolls) irrespective of whether the manager was fired or not.

In this blog I’m going to focus more on the longer term. Specifically, I’ll look at league rankings, comparing each team’s position at the end of the season against its position at the point when the manager was fired. In the harsh light of data, is there any evidence that clubs that sack their manager before April perform better over the remainder of the season than their closest competitors?

Mid-season sackings in the EPL and Championship


To answer this question, I identified every in-season manager change that has occurred in the EPL since the 1996/97 season, and in the Championship since 2006/07. Discarding outgoing caretaker managers (which I define as managers in position for four weeks or less) gave me a sample of 259 changes: 117 changes in the EPL and 142 in the Championship.

I then classified each manager departure into one of three categories: sacked, resigned, and mutual consent. For example, of the 117 in-season manager departures in the EPL over the last 20 seasons, 71 were fired, 36 resigned and 10 left by mutual consent. In this analysis we’re only interested in those that were forced out, which I will define as either sacked or leaving by mutual consent (the latter typically being a nice way of saying that he was sacked).

Managerial changes occur throughout the season, however I’m going to focus on those that occur in the middle portion of the season, from November through to March. Manager firings that occur early in the season can be due to reasons other than the team’s recent performance. Likewise, those that occur late in the season tend to be with an eye to the summer and following season. Retaining only the mid-season sackings left me with a final sample of 111, with just over half being at EPL clubs.

Finally, I also identified a sample of clubs that were in a similar league position to those that fired their manager (within 3 points on the date it was announced) but retained the same manager for the entire season. We’ll compare this baseline sample with the manager-change sample and see if the latter did any better.

Results


Figure 1 plots the league position on the date the manager was removed (x-axis) against league position at the end of the season (y-axis), for each team in the manager-change sample. The black circles represent EPL clubs; the blue triangles Championship clubs. The red diagonal line indicates the same league position at departure and season end. The shaded regions above and below the line encompass teams that finished 3,6 or 9 places higher or lower than their position when the manager was sacked.

It’s clear that the majority of mid-season manager firings occur at clubs in the bottom half of the table. Of the EPL firings, 89% were at teams below 10th, and 66% were at teams in the bottom five places. Likewise, in the Championship 82% of sackings were at teams below 12th, and 51% at teams in the bottom 6.  Of the 6 sackings that occurred at EPL teams in the top-half of the table, 4 were at Chelsea[1].

Figure 1: the league position of EPL and Championship teams on the date their manager was fired (x-axis) against their league position at the end of the season (y-axis). The black circles represent EPL clubs, the blue triangles Championship clubs. The red diagonal line indicates the same position at departure and season end; the shaded regions above and below encompass teams that finished 3,6 or 9 places higher or lower than their position when the manager was sacked. 

There is no evidence that teams gain any kind of advantage by sacking their manager. The median position change is zero, i.e. no change. Specifically: 30% of teams end in a lower position than when the manager was sacked, 23% see no change in their position and 48% see an improvement. If we compare this to the baseline sample -- clubs in similar positions in the table that retained the same manager for the entire season -- we find roughly the same proportions: 38% ended the season in a lower position, 17% saw no change in their position and 45% improved their position.

We can be more specific and look at clubs in the relegation zone when the manager departed. As the table below shows, of those that fired their manager 34% survived; of those that did not 39% survived. There is no evidence that firing the manager helps avoid relegation.


But what about Leicester?


Leicester fired Ranieri more than a month ago and have not lost since. They’re currently 2 places above their league position after his last game and seem likely to continue their recovery up the table. Didn’t they benefit from firing their manager?

While Figure 1 demonstrates that, on average, a club’s league position is not expected to improve after their manager is sacked, some individual clubs clearly did go on to significantly improve their league position. For instance, when Brendan Rodgers was fired from Reading in 2009/10 they were in 21st position; under his replacement, Brian McDermott, they went on to finish in 9th. Crystal Palace sacked Neil Warnock just after Christmas in 2014 when they were in 18th position; by the end of the season Alan Pardew had guided them to 10th. 

On the other hand, clubs that do not switch manager also undergo miraculous recoveries. In the 2001/02 season Blackburn Rovers rose from 18th place in mid-March to 10th place by the end of the season. In late November 2008, Doncaster Rovers were rooted at the bottom of the Championship in 24th place; an eight match unbeaten run lifted them up to mid-table and they finished in a respectable 14th place. Both teams retained the same manager for the entire season: Graeme Souness and Sean O'Driscoll, respectively.

There are clearly circumstances that might necessitate a managerial firing in the middle of the season -- Leicester may be an example of this. But to pull the trigger without a clear diagnosis of what has gone wrong is a sign of desperation and poor decision-making. Indeed, over the last twenty seasons, EPL managers appointed during the summer months have, on average, lasted over 100 days longer in their jobs than those appointed during the season. Coupled with the large compensation payments that are often necessary to remove a manager, mid-season changes may actually end up harming the long-term prospects of a club.



--------------------------
[1] Specifically: Gullit in 97/98, Scolari in 08/09, Villas-Boas in 11/12 and Di Matteo in 12/13.

Saturday, 11 February 2017

The Wisdom of Crowds: A Census of EPL Forecasts

Introduction


We're nearly two-thirds of the way through the 2016/17 EPL season, which seems a good time to try to predict what might happen. Chelsea’s nine-point cushion and relentless form make them clear favorites for the title; not since Newcastle in 1996 have a team blown such a lead. Just five points separate second from sixth as the remaining superpowers battle for Champions League places: who will miss out? Perhaps the mantra ‘most competitive EPL season ever’ is best reserved for the relegation fight, though. Six teams, two points and an ever-changing landscape. Amongst them: last season’s heroes, Leicester. Too good to go down?

Most TV pundits are definitive in their predictions, indeed they are typically paid to be so. Others prefer to let the numbers do the talking. Football analysts around the world build mathematical models to measure team strength and calculate the probability of match outcomes. Rather than saying “team A are likely to beat team B”, they'll say “I estimate that there is an 85% probability that team A will win”.

There is no agreed method for designing a forecast model for football. Consequently, predictions vary from one model to another. However, there is also strength in diversity. Rather than comparing and contrasting predictions, we can also collect and combine them to form a consensus opinion.

Last January, Constantinos Chappas did just that. Following gameweek 20, he collected 15 sets of predictions, averaging them to produce a ‘consensus forecast’ for the outcome of the 2015/16 EPL season. His article was published on StatsBomb here; we’ll return to the success of last year’s predictions at the end. First, I’m going to repeat the exercise for the 2016/17 EPL season. What do the combined predictions say this time around?

Participants


In total there were 15 participants this year, many of whom offered up their predictions in response to my twitter appeal. A big thank you goes out to (in no particular order):

@8Yards8Feet, @petermckeever, @goalprojection, @11tegen11, @cchappas, @SteMc74, @fussbALEXperte, @SquawkaGaming, @EuroClubIndex@opisthokonta and Sky Sports (via @harrydcarr)

To these, I added forecasts from the FT and FiveThirtyEight; I haven’t been in contact with them personally, but their forecasts are publicly available. I also added a bookmaker’s average, calculated by collecting the odds published on oddschecker.com and averaging the implied probabilities. That’s 14 - the final participant was myself (@eightyfivepoint).

The Predictions


Before we get into the results, a little bit about how they’ll be presented. I’ve followed last year’s article and presented forecasts as box-plots. These are a simple graphical representation of the distribution of forecasts for a particular outcome. The height of the shaded area represents the interquartile range: the 25th to 75th percentiles. By definition, half the forecasts lie within this range -- it provides a decent estimate of the variablity of the predictions.  The black horizontal line in the middle is the median (50th percentile), I’ll sometimes refer to this as the consensus forecast. The ‘whiskers’ extending out vertically from each box show the 5th to 95th percentiles. All but the highest and lowest forecasts for a given outcome will lie within this range.

On each plot I've also plotted the individual predictions as coloured points. They are identified by the legend on the right.

So, without further ado, here are the forecasts for this 16/17 EPL season.

The Champions



Not surprisingly, Chelsea are the clear favourites: the median forecast gives them an 88% chance of winning the league, as do the bookmakers. There’s not a huge amount of variability either, with the forecasts ranging from 80% to 93%. If Chelsea do suffer some kind of meltdown then it’s probably Spurs or City that would catch them, with median predictions of 5% and 4%, respectively. Liverpool and Arsenal are rank outsiders and any of the other teams finishing top would be an enormous surprise.

The Top Four



Now this is where things get a bit more interesting. Chelsea seem almost guaranteed to finish in the Champions League places, which leaves five teams fighting it out for the remaining three. Tottenham and Man City are heavily favoured: both have a median probability of at least 80% and the whiskers on their box-plots do not overlap with those of the next team, Liverpool.

The real fight is between Klopp and Wenger. Statistically they are almost neck-and-neck, with their box-plots indicating that the individual predictions are broadly distributed. Look closely and you see an interesting negative correlation between them: those that are above average for Liverpool tend to be below average for Arsenal (and vice-versa). You can see this more clearly in the scatter plot below. The reason must be methodological; to understand it we’d have to delve into how the individual models assess the teams' relative strength. Note that the bookies are sitting on the fence - they've assigned both Arsenal and Liverpool a 53% chance of finishing in the top four.


Man United are outsiders, but the consensus forecast still gives them about a 1 in 3 chance of sneaking in. Interestingly, the bookmakers odds – which imply a 44% chance of United finishing the Champions League positions - are way above the other predictions. Perhaps their odds are being moved by heavy betting?

The Relegation Candidates



Two weeks ago it looked like Sunderland and Hull were very likely to go down. Since then, the relegation battle has been blown wide open. The first six teams seem set for a nervous run-in and neither Bournemouth nor Burnley will feel safe.

The principal candidates for the drop are Sunderland, Hull and Palace, all of whom have a median prediction greater than a 50% chance of relegation. There is clearly a lot of variability in the predictions though, with the Eagles in particular ranging from a 38%-74%. You can certainly envisage any one of them managing to escape.

The next three clubs - Middlesbrough, Swansea and Leicester - are all currently level on 21 points, yet the median predictions imply that Middlesbrough (42%) are nearly twice as likely to go down as Leicester (22%). I suspect that this is because some models are still being influenced by last season’s results (for instance, Leicester's forecasts appear to bunch around either 15% or 30%). The amount of weight, or importance, placed on recent results by each model is likely to be a key driver of variation between the predictions.

What about <insert team’s name here>?


The grid below shows the average probability of every EPL team finishing in each league position. Note that some of the models (such as FiveThirtyEight, Sky Sports and the bookmakers) are excluded from the plot as I wasn’t able to obtain a full probability grid for them. Blank places indicate that the probability of the team finishing in that position is significantly below 1%.

An obvious feature is that Everton seem likely to finish in 7th place. The distribution gets very broad for the mid-table teams: Southampton could conceivably finish anywhere between 7th and 18th.


Last year’s predictions.


So how did last years’ predictions pan out? Leicester won the league, but the median forecast predicted only a 4% chance of this happening (compared, for example, to a 40% chance that they would finish outside the Champion's League places). However, the top four teams were correctly predicted, with a high probability of finishing there having been assigned to each of Leicester, Arsenal, City and Spurs.

Down at the bottom, both Newcastle and Villa were strongly expected to go down and they did. Sunderland were predicted to have only a 15% chance of staying up, yet the Black Cats escaped again. Instead, Norwich went down in their place having been 91% to stay up. Other surprises were Southampton (7 places higher than expected), Swansea (5 higher) and Crystal Palace (down 7).

How good were last year’s forecasts, overall? This is a tricky question and requires a technical answer. The specific question we should ask is: how likely was the final outcome (the league table) given the predictions that were made? If it was improbable, you could argue that it happened to be just that – an outlier. However, it could also be evidence that the predictions, and the models underlying them, were not particularly consistent with the final table.

We can attempt to answer this question using last season’s prediction grid to calculate something called the log-likelihood function: the sum of the logarithms of the probabilities of each team finishing in their final position. The result you obtain is quite low: simulations indicate that only about 10% of the various outcomes (final rankings) allowed by the predictions would have a lower likelihood. It is certainly not low enough to say that they were bad, it just implies that the final league table was somewhat unlikely given the forecasts. A similar result this time round would provide more evidence that something is missing from the predictions (or perhaps that they are too precise).

A final caveat..


Having said that – models are only aware of what you tell them. There are plenty of events – injuries, suspensions, and managerial changes – of which they are blissfully unaware but could play a decisive role in determining the outcome of the season. Identifying what information is relevant – and what is just noise – is probably the biggest challenge in making such predictions.

I will continue to collect, compare, combine and publicize forecasts as the season progresses: follow me on twitter (@eightyfivepoint) if you'd like to see how they evolve.


(This is a piece that I wrote for StatsBomb; I've copied it here.)



Wednesday, 18 January 2017

Poor FA Cup crowds erode home advantage

I was struck by the poor attendances at some of the FA Cup 3rd round matches this month. 17,632 turned up to watch Sunderland vs Burnley, less than half Sunderland’s average home gate this season. It was a similar story at Cardiff vs Fulham, Norwich vs Southampton and Hull City vs Swansea, all of which saw crowds below 50% of their league average this season.

An interesting statistic was recently posted on Twitter by Omar Chaudhuri, of 21st Club (@OmarChaudhuri). If you take all 181 FA Cup ties that involved two EPL teams (ignoring replays and matches at neutral venue) since the 2000/01 season, you find that the home team won 46% of the matches and the away team 30%. However, if you look at the equivalent league match between the teams in the same season, you find that the home team won 52% of the matches and the away team 22%. Although the sample size is small, the implication is that home advantage is less important in cup matches.

Lower FA Cup crowds and diminished home advantage - are the two connected? This seems a reasonable hypothesis, but I’ve never seen it demonstrated explicitly. I aim to do so in this post.

Cup Matches vs League Matches


To answer the question I’ll look specifically at cup ties that involved teams from the same division, from League 2 to the EPL, and compare the outcomes to the equivalent matches in the league. This approach isolates the influence of any changes in circumstance between the two games – including lower or higher attendance.

I identified every FA Cup tie, from the third round onwards, that involved two teams from the same-division since 2000/01[1], along with the corresponding league match.  I then removed all matches at a neutral venue[2]. This left me with a sample of 357 cup matches, and the same number in the league.

I then measured what I’ll refer to as the home team’s attendance ratio -- their average home-tie FA cup attendance divided by their average home league attendance -- in each of the last 16 seasons. Season-averaged attendance statistics for both league and FA cup games (3rd round onwards) for every team were taken from www.worldfootball.net. Ideally, you would directly compare the attendance of each FA Cup tie with that of the equivalent league game. However, I don’t have the data for individual games, so instead I used each team’s season averages for cup and league as a proxy (but if anyone has this data and is willing to share it, please let me know!)

I used the attendance ratio to divide my sample of matches into three sub-samples: well-attended matches, mediocre attendance and poorly-attended matches. The former are defined as cup matches in which the crowd size was greater than 90% of the home team’s league average. A mediocre attendance is defined as a crowd size less than 90% but greater than 70% of their league average, and a poorly-attended one as less than 70% their league average. For each group, we’ll look at differences in the fraction of home wins, away wins and draws between the FA Cup ties and league matches.

Table 1 summarizes the results. Let’s look at the first three lines - these give outcomes for cup ties in which the attendance was at least 90% of the league average. There have been 148 such matches in the last 16 seasons: the home team won 56%, the away team 23% and 21% were draws. In the corresponding league matches, the home team won 51%, the away team 24%, and it was a draw in 26%. So, there was a small increase in the proportion of home wins relative to the league outcomes, with correspondingly fewer draws. In about a third of these ties the attendance was greater than their league average: the home side may have benefited from a more vociferous support.

Table 1

The next set of lines in Table 1 show the results for the FA Cup matches that had a mediocre attendance – those in which the attendance ratio was between 70% and 90% of the home side league average. The home team won 44% of these matches, which is slightly below the home win rate in the corresponding league matches. There is again a fall in the number of draws, but this time the away team benefits, winning 6% more often than in the league matches. The differences are small, but there is some evidence that the away team were benefitting from the below-average attendance.

However, the increase in away wins becomes much more striking when we look at poorly-attended cup matches: those in which the attendance was less than 70% of the home team's league average. The home team won only 34% of these ties, 14% below the corresponding league fixtures. The away win percentage increases to 42% and is 19% above the league outcome. Indeed, the away team has won poorly-attended cup matches more frequently than the home team. This is despite the home team winning roughly twice as often as the away team in the corresponding league fixtures (48% to 23%). The implication is very clear: when the fans don’t show up for an FA Cup tie, the team is more likely to lose. I don’t think I’ve seen any direct evidence for this before[3].

In all three sub-samples, it's worth noting that draws are down 5% relative to the corresponding league outcomes (although the beneficiary depends on the attendance). Presumably this is down to the nature of a cup tie: teams are willing to risk pushing for a win in order to avoid having to play a troublesome replay (or a penalty shoot-out during a replay).

So why are some fans not showing up? One obvious explanation is that they are simply unwilling to shell out more money beyond the cost of a season ticket. Maybe clubs should lower their prices for FA Cup matches; I’d be curious to know if any do. There could even be an element of self-fulfilling prophecy: the fans believe that their team have no real chance of winning the cup and so choose not to attend, to the detriment of their team. Perhaps the fans are aware that the cup is simply not a priority – their club may be involved in a relegation battle, for example – and that they are likely to field a weakened team.

The bottom line seems clear enough, though: if clubs want to improve their chances of progressing in the FA Cup they should ensure that they fill their stadium.


--------------------
Thanks to David Shaw, Jim Ebdon and Omar Chaudhuri for comments.

[1] Data was only available for all-Championship ties from 02/03, 08/09 for L1 and 09/10 for L2.
[2] Replays were retained, although the outcome of penalty kicks was ignored (i.e., a draw at the end of extra-time was scored as a draw). There are 64 replays in the sample in total, of which 8 went to penalties.
[3] One caveat is that the sample size is pretty small: this analysis could do with being repeated on a larger sample of games (and with the specific match attendances, rather than season averages). However, the increase in the away percentage in the smallest sample (attendance ratio < 0.7) is still highly significant. 

Tuesday, 10 January 2017

The Frequency of Winning Streaks

Thirteen – an unlucky number for some. So it proved for Chelsea: just one win shy of equaling Arsenal’s record, their thirteen-match winning streak was finally ended by an in-form Spurs side. While there may be some temporary disappointment amongst Chelsea fans at having failed to set a new record, their winning run has almost certainly propelled them into the Champions League next season and made them clear favourites for the title.

Sir Alex Ferguson would often refer to momentum as being instrumental to success. A winning streak can sweep teams to the title or snatch survival from the jaws of relegation. What constitutes a good streak is clearly dependent on the team, though.  Manchester United are currently on a five-match winning run: such form would certainly be outstanding for a relegation-threatened team, but is it common for a Champions League contender? This question is itself part of a broader one: what is form and how should we measure it?

In this blog I’m going to take a look at some of the statistics of winning streaks, investigating the characteristic length of winning runs in the EPL and how it varies for teams from the top to the bottom of the table.

How well do teams streak?


I started by taking every completed EPL season since 2000/01 and dividing the teams into bins based on their points total at the end of each season (0-40 points, 40-50, 50-60, and so on)[1]. For each bin, I measured the proportion of the teams in that bin that completed a winning streak, varying the length of the streaks from 2 to 10 matches.  For example, of the 54 sides that have finished on between 50 and 60 points since the 2000/01 season, 17 (31%) completed a winning run of at least 4 matches.  Runs were only measured within a single season – they do not bridge successive seasons[2]. The results are summarized in Table 1.


Table 1: The proportion of teams that complete winning runs of two games or longer in the EPL. Teams are divided into bins based on their final points total in a season, from 0-40 points (top row) to >80 points (bottom row).

The top row gives the results for teams that finished on less than 40 points. The columns show the percentage that managed a winning streak, with the length of the streaks increasing from 2 (left column) to >10 matches (right). Three quarters of the teams in this points bin put together a winning streak of at least two games. However, the proportion drops very rapidly for longer runs: only 14% completed a 3-match winning streak and only 7% a 4-match streak. The only team to complete a 5-match winning streak was Newcastle early in 2014/15 (and this was half of the total number of games they won that season).

As you'd expect, the percentage of teams that achieve a winning streak of a given length increases as you move to higher points bins. Every team that has finished with 60 points or more has completed a 3-match winning stream. However, fewer than a quarter of those that finished with less than 70 points completed a 5-match winning streak. In general, the proportion of teams that achieve a winning streak drops off very rapidly as the length of the streak is increased. 

The exception is the title-challenging teams (the bottom row in Table 1): the percentage in this bin falls away more slowly as the the length of the winning streak is increased. 27 of the 29 teams that finished with at least 80 points put together a 5-match winning streak, 13 completed an 8-match streak and 5 completed a 10-match winning streak. This is the success-generating momentum that Ferguson habitually referred to.

In his final 13 seasons (from 2000/01 to 2012/13), Man United put together 14 winning streaks lasting 6 matches or more; in the same period Arsenal managed only 5. United won 7 titles to Arsenal’s 2. For both teams, the majority of these streaks occurred in title-winning seasons. The same applies to Chelsea and, more recently, Man City. Only two title-winning teams have failed to complete a 5-match winning streak: Man United in 2010/11 and Chelsea in 2014/15. The median length of winning streak for the champions is between 7 and 8 games.

Leicester’s 4-match winning streak at the end of the 2013/14 season saved them from relegation. It was also an unusually long run for a team finishing on around 40 points - only four other teams have managed it. Was this a harbinger of things to come? A year later, during their title-winning season, their 5-match winning streak in March/April pushed them over the line.

The implications for form


Only the best teams put together extended winning runs: 40% of EPL teams fail to put together a three-game winning streak and 64% fail to win 4 consecutive games. Perhaps momentum - and the belief and confidence it affords - is only really relevant to the top teams? Does the fixture list throw too many obstacles in the path of the smaller teams? Every 3 or 4 games a smaller team will play one of the top-5 sides, a game that they are likely to lose. This may make it more difficult for them to build up a head of steam.

On the other hand, perhaps smaller teams are able to shrug-off their defeats away to Arsenal or Liverpool and continue as before. In that case, should we discard games against the ‘big teams’ when attempting to measure their form? And to what extent do draws interrupt, or in some cases boost, a team's momentum? These are all questions that I intend to return to in future blogs.

Unbeaten Runs


Finally, I’ll leave you with the equivalent table for unbeaten runs. While the typical length of unbeaten runs in each bins is about twice as long as winning runs, most of the conclusions above still apply.

Table 2: The proportion of teams that complete an unbeaten run of length 2 or longer in the EPL. Teams are divided into bins based on their final points total in a season, from less than 40 points (top row) to more than 80 (bottom).

---------------

Thanks to David Shaw for comments.

[1] The total number of teams across all bins was 320: 16 seasons with 20 teams per season.
[2] Note that the runs are inclusive - if a team achieves a 3-match streak it will also have achieved a 2-match streak.