Friday, 26 August 2016

A History of Home Advantage

Home advantage is a much studied topic, and many people have offered suggestions as to where it originates: the motivational effect of playing in front of your own fans, the comfort of a familiar environment and match day routine, the possible influence of home fans on the referee (something that I want to come back to in a later post), the wearying effect of away team travel, or simply a higher expectation of home team success. It is present in many sports (see here), in many countries (see here).

In future posts I’d like to take a more detailed look at some of the factors that might contribute to the home advantage effect, but first let’s start with some basic statistics: how big is the home advantage, how does it compare across the top four divisions in England, and how has it varied since the birth of organised football?

Home Advantage 101

The table below shows the percentage of games won by the home team, away team and drawn in the last 21 years of the top four divisions of English football.

Table 1: Home win, away win and draw percentage of games played in England’s top four divisions since the 1996/97 season. Percentages may not exactly sum to 100 because of rounding.

The headline message is that across all games played in the English professional leagues in the last few decades, the home team wins 44%, the away team 28% and 27% are drawn. If home advantage didn’t exist, home and away teams would win the same proportion of games. 

Table 1 also shows that home advantage declines slightly as we move down the leagues. On average, home teams in the EPL are 70% more likely to win than the game than the away team; this falls to 50% in League 2. There may be reasons for this but they are not immediately apparent.

The decline of home advantage

Figures 1 to 3 below show the home, away and draw win fractions in the top four divisions of English football in each year of their existence since 1888, with each coloured circle representing one of the divisions. The gray line shows a smoothed average across all four divisions.

The results are intriguing. Figure 1 demonstrates that the percentage of home wins in English football has certainly decreased over the last 120 years, particularly in the post-war era where there has been a steady decline.

What’s driving this? Figures 2 and 3 show the underlying dynamics are complex. 

In the first twenty years of the Football League the home win percentage fell from around 60% to the mid-fifties, compensated by an increase in draws. Then, for the fifty years to 1960, the away win and draw percentage hovered around the low 20s, increasingly slightly. However, in the 1960’s, the draw percentage suddenly jumped up to its current level and have flat-lined since. Since the late 70’s, the rate of away victories has steadily increased to around 30%. 

It seems reasonable that the increasing away win percentage may initially have been a result of the rule change in 1981 awarding three points for a win rather than two: teams playing away from home had more motivation to go for a win. But then why has the draw percentage not decreased appreciably, and why has the away percentage continued to steadily rise since? 

Some have pointed to the increasing impact of television, suggesting that away team players – knowing their fans can now monitor their performance – try harder. But most televised games are in the top division, so why do we see similar trends in the lower divisions? Importantly, it’s clear that whatever the factors are that drive these changes, they are present in all four divisions.

The increasing professionalization of the sport, including improved training methods, increased financial rewards and better, more consistent officiating has almost certainly played its part gradually eroded the natural advantage provided by playing on home turf.

Another interesting proposition is that home advantage derives from a territorial urge to repel interlopers; as a motivational factor, it is reasonable to think that this would be felt most keenly by players who are from the same area as team they are playing for. Could it be then, that as teams were increasingly built through buying and selling players rather than through developing home grown talent, the inbuilt motivational factor of playing at home has gradually been weakened? This would be a very interesting theory to test further.

Figure 1: Percentage of home wins in the English top four divisions. The grey line shows a smoothed average over all divisions.
Figure 2: Percentage of away wins in the English top four divisions. The grey line shows a smoothed average over all divisions.
Figure 3: Percentage of draws in the English top four divisions. The grey line shows a smoothed average over all divisions.

Thursday, 18 August 2016

The General Decline of the Underdog

In my first blog I argued that Leicester’s astonishing 40 point improvement last season (over the previous season) was an achievement unrivalled not just in the premier league era, but since the second world war. However, the achievements of three other teams went unmentioned: Spurs in 1950/51, Ipswich in 1961/62 and Nottingham Forest in 1977/78.

On the face of it, these three teams went one better than Leicester – winning the top division in their first season following promotion from the second tier of English football. I didn’t include them in my analysis as I only looked at points accumulated over two consecutive seasons in the top division; newly promoted teams were therefore not accounted for.

In recent years we’ve become used to a familiar story: newly promoted clubs face an uphill struggle for survival and often make an immediate return to the division below[1]. If a club were to win the EPL in their very first season this would surely represent an even greater shock than Leicester’s triumph. So why were Ipswich and Forest able to achieve this in the 60s and 70s? 

It turns out newly promoted teams didn’t always fare so badly, as I shall now demonstrate.

A Widening Gap

Figure 1 shows the total number of points accumulated by newly promoted clubs in their first season in the top division (and only their first season). For instance, the circled points show the points totals of Spurs, Ipswich and Nottingham when each won the league. The black dashed lines show the average number of points that were needed to finish in the top half of the table in each decade.
Figure 1: Total points accumulated by newly promoted teams in their first season in the top division. Dashed lines show the average number of points that were needed to finish in the top half of the table in each decade.    

I find this plot very revealing.  There is a clear downward trend in the points totals (especially in the last thirty years). More importantly, the performance of newly promoted teams clearly degrades relative to the league median . In the first three decades, over 40% of the new teams were finishing in the top half of the table; in the last decade this has dropped to 6%. Similarly, in the first three decades 7% were relegated in their first season; in the last 10 years this has rocketed to 40%.

Up to the mid-1980s, newly promoted sides were about as good as the average top division team; they could expect to finish near the middle of pack, collecting an average of around 50 points. Many were breaking the 60-point mark.  

When they won the league, Spurs collected 35 points more than the new-promotion average, while Ipswich collected 30 more and Nottingham 39. This is comparable with Leicester’s 40-point improvement, but again with the caveat that the former teams played 4 more games per season.

The purpose of this analysis is not to denigrate the achievements of Spurs, Ipswich and Nottingham Forest; it’s simply to demonstrate that the gap in quality between the top two divisions – or perhaps, between the top division and the rest – is much wider. What was once a corridor, has become a chasm. So much so that teams relegated from the EPL receive “parachute payments” to cushion their fall from grace.

Of course, this is all down to money. The breakaway of the EPL from the football league liberated its members to negotiate lucrative TV deals far in excess of those available to the clubs in the divisions below. It’s no surprise that newly-promoted teams need time to adjust to the rarefied atmosphere.

In the pre-EPL era, newly-promoted sides sometimes had the resources to scale the heights; today, they are struggling just to survive.

[1] Which is currently (and somewhat confusingly) called ‘the Championship’.

Sunday, 14 August 2016

Was Leicester’s achievement last season unprecedented?

On 25th May 2016, Leicester City were crowned premier league champions for the first time in their 132-year history. A team that had only just escaped relegation the previous year had achieved a seemingly impossible feat. At the start of last season many bookmakers in the UK were offering odds of up to 5000/1 for Leicester to win the league; it’s very hard to find odds longer than 5000-1 offered on anything.

For me, the most striking statistic was that Leicester accumulated 40 more points in the 2015/16 season than the previous season, enough to take them from nearly relegated to champions. How unlikely was this?

The formation of the EPL in 1992 is generally viewed as a watershed moment for top flight football. Billions of pounds in TV money were pumped in, and the image of the sport altered from that of a male-dominated, largely working class pastime to a more family-friendly pursuit for the middle classes. The grittier, more rugged edges of the sport were smoothed away. Out went the terraces and in came plush all-seater stadiums. And up went the prices. A lot.

This injection of finance and general facelift are oft-cited as having made the sport less egalitarian. The conventional wisdom is that, without a substantial influx of cash from a multi-billionaire owner, only a small number of teams have the resources and financial fire-power to make them realistic contenders for the title.

Prior to 1992, England’s top division is thought to have been a more equitable, competitive league. In the 24 years that preceded the formation of the EPL, 21 different teams were able to finish in the top four places; in the EPL era only 14 have managed this.

So, one might think that Leicester’s win was a throwback, an anachronistic return to a bygone era before sporting romanticism was buried under a great big pile of cash. But is this really true? Is Leicester’s win more unusual in this era, or would it have been unusual in any era?

As I have said, for me the key feature of Leicester’s win was their 40-point increase from the previous season. So in order to answer this question, we need to ask another one: in the post-war epoch, on how many occasions has a team managed to improve its previous season’s total by 40 points or more?

Performance improvement over consecutive seasons.

To answer this, I looked at the history of the top division in England since 1945. This amounts to 70 seasons and includes a total of 60 different teams.[1]  

For every season to 2015/16, I compared the number of points obtained by each team in the league that season with the number they obtained the previous season (assuming they hadn’t just been promoted – I only look at points in the top division). All points totals are calculated on a 3 points for a win basis. Over all seasons and teams in my sample, this provided me with 1290 ‘pairs’ of points.

In Figure 1 I plot these pairs of points. Each blue cross shows the number of points obtained by a single team in consecutive seasons, with the first season on the x-axis and the following season on the y-axis. For example, I’ve circled and labeled the cross that indicates the points obtained by Leicester in the 14/15 & 15/16 seasons.

The central dashed line indicates where the same points total was obtained in successive seasons; the upper/lower dotted lines indicate where teams obtained 10 more/less points the following season (which is just under 1 standard deviation).
Figure 1: points obtained in successive seasons for each team in English top division since 1946.

As you’d expect, there is a strong correlation: better teams consistently accumulate more points than weaker teams, season after season. However, there is large amount of scatter, with teams frequently obtaining 10 points more or less than the previous season.[2]

However, it’s immediately clear that Leicester’s performance between the 2014/15 and 2015/16 seasons is highly unusual. In fact, across the 70-year history only one other team has ever matched their 40-point improvement: Arsenal in 1969/70 to 1970/71, and this was in a 22-team league rather than the modern 20-team league (i.e. they played 4, or 10%, more games). In statistical terms, Leicester’s 40 points improvement last season around a 3.6-sigma event. The probability of an improvement of this magnitude occurring turns out to be roughly 1 in 6000.[3,4]

So Leicester’s improvement last season wasn’t just unprecedented in the premier league era, it is arguably unprecedented in the post-war era. Viewed this way, what we should find remarkable is not that Leicester – by premier league standards a relative minnow – won the league, but the speed with which they did it. Other teams have risen from obscurity to prominence, but normally only after being purchased by wealthy owners, and rarely has success been immediate.

Leicester’s transition from relegation fodder to champions is, statistically, one of the greatest rags to riches stories ever told. Will we ever see the like again?

[1] In most professional football leagues in Europe there is promotion and relegation between leagues, with (typically) the 3-4 teams finishing bottom in a season being replaced the following season by the same number of teams finishing at the top of the league below.
[2] Note the y-axis goes to smaller values than the x-axis; this is because teams that were relegated in the first season are not plotted.
[3] Assuming the data is normally distributed: the distribution of the points difference from one year to the next shows that this actually is a reasonable assumption. 
[4] Everton's performance in the opposite direction in 1970/71, when they were a whopping 46 points worse off than the previous season, is even more unlikely. 

Saturday, 13 August 2016

Welcome to my blog!

I've created this blog to combine two things that I spend a substantial amount of my time doing: analyzing data and following football (soccer).

I'm an astrophysicist by training, receiving my PhD from Cambridge University in 2006 and spending six years as postdoctoral researcher in Canada and the USA, first at McGill University and then Yale. I returned to the UK to work as a quantitative researcher at London-based hedge fund, before moving to then Fiscal Policy team at HM Treasury.

I've been a Manchester United fan for more than 25 years. Yes, I was born in Manchester. I don't remember the pre-Ferguson years, but I do remember the 1991-92 season very well, and the disappointment of losing out to Leeds United. Fortunately, things quickly began to get better.

I grew up in a household equally split between United and Liverpool (and a football-hating mother). There was consequently much football 'debate', with lots of random football 'statistics' being lobbed around. So I guess in starting this blog i'm returning to my youth in some ways (plus my Liverpool-supporting brother is going to help edit).

As I said at the top, the principal aim is to write a football-orientated data blog. I am going to use whatever data I can get my hands on to try to explore interesting questions or statements, investigate some of football's considerable store of 'conventional wisdom' and generally see if I can find any new or interesting perspectives.

The second goal is to try to keep things simple. Wherever possible I want to try to extract whatever story the data has to tell without having to having to write thousands of words explaining how I did it. Same for plots: I'm going to try to aim for a maximum of one per post.

Anyway, let's see how it goes.