Monday, 26 September 2016

English Hares and Italian Tortoises: When do Goal-scorers peak?

Yesterday, I walked into a New York bar just in time to see Francesco Totti wheel away in celebration. He had just sent Torino's goalkeeper Joe Hart the wrong way from the penalty spot to score his 250th Serie A goal. Totti has scored more Serie A goals than any other player in the last sixty years. He has also now scored in 23 consecutive Serie A seasons.  He turns 40 on Wednesday.

An interesting feature of Totti’s career is that he scored nearly half of his goals since he turned thirty. He didn’t even really get going until his late twenties, a slow burner. Contrast this with Wayne Rooney: 173 EPL goals so far, half of which were achieved by the age of 24. A fixture in the Man Utd team since the age of 18, he is now, at 30, perceived to be much a diminished force.

So when do strikers normally reach their goal scoring peak, and how rapidly do they decline thereafter? Do the hares that establish themselves early in their career tend to burn out faster than the tortoises that have the more sedate start? To investigate, we need to look at the goal-scorer's aging curve.

The goal-scorer's aging curve

The aging curve of footballers – how their ability and effectiveness changes with age – is notoriously difficult to measure. How do you know when a defender has reached the peak of his career? Defending is very much a team responsibility, and great defenders can often play in attack-minded teams that concede quite a few goals. Football is team sport and it is difficult to isolate the contribution of individual players.

However, the main job of a striker is to score goals, which is a convenient a barometer by which to judge the ebb and flow of a career. By looking at how their average goals per game ratio varies over their careers, we can investigate the age at which they hit peak goal scoring ability. Obviously there are lots of factors in play: the number of chances created by teammates, the standard of the opposition, changes in position. But if we take a large enough sample of players some of this should average out, enabling us to measure the effects of aging.

I collected the career statistics of the 50 highest goal scorers in England, Italy and Spain since the 1992/93 season. This gave me a sample of 146 players (there’s a bit of overlap between the countries). In every year of their career, I measured the goals-per-game ratio for each player[1]. After a bit of smoothing and rescaling[2], I calculated the average profile across the 146 strikers. The result is shown in Figure 1.

The plot shows how scoring rate depends on player age for the most proficient goal scorers to have played in England, Spain and Italy in the last 25 years. The curve has been scaled so that the peak – which is at age 26 – is equal to one. The shaded region is indicative of the level sampling uncertainty.

Figure 1: The goal-scorer's aging profile: the rate at which the goals/game ratio varies over an elite striker’s career. The curve is scaled such that the peak is equal to 1. 

Footballers are generally perceived to peak in their mid-twenties, between the ages of around 24 and 29; the data in Figure 1 supports that notion. However, what surprises me is that the slope of the curve is not steeper as we move away from the peak in either direction. The profile implies that at the ages of 18 and 32 strikers are scoring at 80% of their peak rate, which seems quite high.

Part of the explanation for this is related to sample selection: I’m looking at the top goal scorers over the last 25 years and so their careers at the top level of football were naturally quite long (basically a form of survivorship bias). However, there is another, more interesting reason. Figure 2 shows the ageing profiles for the EPL and Serie A players separately. There is a striking difference (hah!).

Figure 2: The aging profiles for top strikers in the EPL and Serie A. Both curves are scaled such that their peaks are equal to 1. The is a clear preference for youth in the EPL and experience in Serie A. The grey region indicates sample uncertainty.

Players that play in the EPL score a substantial portion of their goals in their early twenties. There is a slow increase in the strike rate up to the peak at age 26 and then a more rapid drop-off. Serie A players are the opposite: they score predominantly in the latter half of their career. While their peak is only a year or two later than EPL players, there is a more gradual decline thereafter. The La Liga players – in case you’re wondering – are between the two.

I interpret this as resulting from the difference in the style of play between the two leagues. The frenetic pace of the EPL lends itself to the energy and potency of youth, while the slower, more cerebral style of Serie A requires the game intelligence which is, generally, the product of experience. 

This hypothesis is supported by player appearances. By the age of 22 the top strikers in the EPL have played 22% more league games than their Serie A counterparts. However, over the age of 30, strikers in Serie A play 18% more games. Young players get more opportunities in the EPL, older players are preferred in Serie A. An age difference is also evident in the national teams: at Euro 2016, the average age of the England squad was 25.4; the average age of the Italian team was 28.4. 

The relentlessness of the EPL may also expedite a player’s decline, while the more forgiving tempo of Serie A slows it. Hence strikers play on for longer in Italy. Totti was the youngest player ever to captain a Serie A team; he is also the oldest player to score in the UEFA Champions League.

In his autobiography, England manager Sam Allardyce suggested that players are at their best for around a decade. If we accept this as true, then perhaps it is no surprise that Rooney’s days as a leading goal scorer appear to be over. Of course, Ryan Giggs was able to remodel his playing style enabling him to play for Man Utd into his forties. Will Rooney also learn how to become a tortoise?

[1] If a player made less than 15 appearances in a given season, I ignored their ratio for that season. If a player played in multiple leagues in his career, I also only consider games played in the top divisions of England, France, Spain, Germany and Italy.
[2] Each profile is rescaled so that the goals/per game ratio is equal to 1 in the peak year. This is because I wanted all players to be weighted equally when taking the average profile over the sample. Before rescaling, I lightly smoothed the profiles with a moving average. 

Friday, 16 September 2016

What's the problem with English players?

The English national team seems to have hit an all-time low. After being humiliated by the mighty Iceland in Euro 2016, England fans must look back wistfully on the days when pundits would forecast that they would be “knocked-out by the first decent team that they meet”; it seems that these days England struggle to progress far enough to meet a decent team. You have to go back to 2002 to find the last time England beat one of the world’s top ten teams in a major international tournament.

A lot has been written about the impact of the EPL, in particular the proliferation of foreign players and the resulting difficulties faced by talented young English players to get playing time. On average, less than a third of the players starting games are English; 25 years ago, in the first year of the EPL, it was more than two thirds. However, while England certainly has the lowest fraction of home-grown players in Europe’s top-5 leagues, Italy and Germany (both around a half) are not much further ahead. Two thirds of the players starting games at the highest level in France and Spain are home-grown.

Another English Trade Deficit

If the competition is so tough at home, why don’t more English players go abroad? Surely they would benefit from the experience: a different style of football, exposure to new training methods and coaching techniques, tactical variations, the fans, weather, scheduling; perhaps even different refereeing standards.

Figure 1 shows the number of English, French, German, Italian and Spanish players that have moved abroad to one of the other big five European leagues in the last fifteen years[1].  For example, since 2000, five English players have moved to Spanish clubs in La Liga; over the same period, 48 Italian, 85 French and 15 German players have transferred to the top level of Spanish football.

Figure 1: The number of English, French, German, Spanish and Italian players that have moved abroad to play in another top-5 European league since the 2000/01 season.

When measured in players, England has a huge trade deficit with the other four countries (and in fact, with all other European countries). More than twice as many Italians have played in the EPL than English players in all the other top leagues put together. Spain, Italy and France have all exported more than 100 of their players since 2000, England has exported just 16 – and three of those are David Beckham[2].

Germany has the second fewest players to have moved abroad to one of the other big European leagues: 55 since 2000. But look closely, and you find that half of them played for the German national team while they were abroad; over the same period only four of the 16 English players played international football while playing abroad. It is, therefore, not unreasonable to infer that (at least when it comes to football) German exports are superior to English ones.  

It wasn’t always this way. The table below compares the number of players moving to the top leagues abroad in the periods 1960-1980, 1980-2000 and 2000-present. In the second half of the last century, there were generally fewer players playing abroad, but they were much more evenly distributed between nationalities. For example, more English players played in the top leagues abroad than Spaniards. However, since 2000, the number of Italian, Spanish and French footballers moving abroad has exploded; meanwhile, the number of English players has more than halved.

Table 1: Total number of players moving abroad to play in a top-5 European league since 1960.    

So is there no longer any demand for English players? Are they too expensive? Or, are they just reluctant to move abroad? Certainly money must be a big factor - the average salary in the EPL is nearly double that of La Liga (e.g. see here and here), and so many English players would have to take a pay cut to move abroad. I don’t have the data, but I expect that the disparity increases for young players

Where have all the Gaffers gone?

The lack of foreign playing experience may also have repercussions at the coaching and managerial level. This season, only five EPL teams have English managers; last season it was eight. Four EPL teams are currently managed by Italians, two by Spaniards. Holland, Portugal, Argentina, France and Germany each have one representative. 

In stark contrast, more than half of Bundesliga managers are German. Three quarters of Serie A teams are managed by Italians, and the same is true in Spain and France. Even in the Championship less than half the managers are English. There are precisely zero English managers working abroad in Europe’s major leagues. 

Perhaps this is a self-perpetuating cycle. Few English players move abroad, and so miss out on the breadth of experience that might make them better managers and coaches in the future. This in turn has a detrimental effect on the development of young English players, who are then less likely to be sought after by major foreign teams. And so it goes on. 

Gary Neville may have failed during his brief tenure at Valencia, but he gained experience that few other young English coaches have (and his brother is still there). I hope others follow his lead.

[1] Players that transferred to a foreign club pre-2000 are not included.
[2] Beckham is included three times as he played for Real Madrid in Spain, PSG in France and AC Milan in Italy.

Thursday, 8 September 2016

Who are the EPL's most injury-prone teams? And why?

So Arsene Wenger has finally decided that there is no room for Jack Wilshere in the Arsenal midfield this season and sent him off to Bournemouth on loan. Or maybe he just wanted to free up space in another Arsenal department: their treatment room. With Wilshere having missed more games than he has played in the last few years, Arsenal’s over-burdened medical staff probably feel like they are due a break. 

Many Arsenal fans that feel that their team have suffered more than their fair share of injuries, but is it really just a few unlucky seasons – and a few particularly fragile players – that stick in the mind? Or is it true that some EPL teams are really more injury prone than others? And if so, why?

The Treatment Table

To investigate this I looked at the total number of injuries suffered by EPL teams each year since the 2004/5 season (taken from here), providing me with 12 years of data encompassing 37 teams. In this dataset, an injury is defined as any condition that put a player out of action for at least two weeks, so minor niggles are not counted.[1]

Table 1 shows a list of these teams, ordered by the average number of injuries they suffered over the twelve-year period. To cut it down, I’ve only included teams that played in the EPL in at least three of the twelve seasons (if you’re curious, Leicester would be very near the bottom of the table, averaging 15 injuries per season in the two years since they were promoted).

Newcastle are the clear winners (or losers?), suffering a whopping 33 injuries per season. Man Utd bump Arsenal into third place, who – surprise, surprise – finish just above Spurs. Looking further down, it’s interesting how few injuries Chelsea players get – 19 per season, which is significantly below the EPL average of 23.

Table 1: Average number of injuries EPL teams have suffered in the twelve seasons since 2004/5.

Chronically Crocked

Of course, the number of injuries a team suffers varies from season to season. Arsenal, for example, went from 35 in the 2014/15 season to 24 last season. So did they just go through a particularly unlucky period, a few bad years, or are some EPL teams more injury prone than others?

To answer this, I broke my data set into two non-overlapping 6-year periods. The first period included the seasons from 2004/5 to 2009/10, the second from 2010/11 to 2015/16. I then calculated the average number of injuries suffered by each EPL team during each period. The results are shown in Figure 1.
Figure 1: Average number of injuries suffered by each EPL team during two six-year periods: the 2004/5-2009/10 seasons (x-axis) and the 2010/11-2015/16 seasons (y-axis). Only teams that played in the EPL in both periods are plotted.

The injury records of Arsenal, Manchester United, Newcastle and Spurs aren’t just bad, they’re consistently bad. Chelsea players, on the other hand, seem to be consistently good at avoiding injuries. In general, there is a clear correlation between the average number of injuries teams picked up in the two periods (the Pearson coefficient = 0.65, which is statistically significant).  Some teams are good at avoiding injuries, while others are particularly vulnerable.[2]

It’s difficult to find a clear explanation for this. Many of the EPL’s established teams are found near the top of Table 1 (with the notable exception of Chelsea), while the smaller teams seem to have a better injury record. You could argue that the better teams are more likely to receive injuries as they tend to control possession in matches and so opposing teams are forced to make more tackles. But then why are Newcastle, Portsmouth and Middleborough so high up? 

Alternatively, you could argue that the best teams play in Europe and get further in the cup competitions; they therefore play substantially more games. But again, why are Chelsea so low, and Newcastle – who have neither regularly played in Europe nor done well in the cups – so high? Squad size also doesn’t appear to be a factor.

However, squad age does seem to hold a clue to the puzzle. Arsenal have consistently had one of the youngest squads in the Premier League, with an average age of just over 24 over the last 12 years[3]. Spurs, Man Utd and Newcastle also tend to have young squads; all four teams have a history of promoting youth players to their first team. Generally, most big EPL teams have well-developed youth academies -- the average age of teams that have played in all twelve of the last 12 EPL seasons is 25. 

On the other hand, newly promoted clubs often try to consolidate their position in the highest echelon by recruiting experienced EPL players. Therefore, they tend to have older squads -- the average age of teams that have played less than six of the last twelve EPL seasons is over 26. 

The obvious implication is that older, more experienced players are better at avoiding injuries. However, there may also be a selection effect: if a young player can’t withstand the rigours of playing regularly in the EPL he won’t make it, no matter how talented he is. Only those that can withstand the constant battering will continue to play at the highest level.

Jack Wilshere may well be a case in point.

[1] I prefer this measure to total number of days lost through injury – which also collect – as it isn’t dominated by a small number of serious injuries.
[2] The outlier to the right of the plot is Southampton. In 2004/05, the season in which they were relegated, they picked up 33 separate injuries. In the 4 years since their return to the EPL, they’ve substantially improved their injury record.
[3] Data taken from

Friday, 2 September 2016

Are referees harsher to the away team?

One of the most frequently cited explanations for home advantage is referee bias: referees being influenced by their surroundings, such as a raucous home crowd, to favour the home team. During a match, a referee must make frequent split-second decisions; could it be that, despite their best efforts to remain impartial, they have a tendency towards appeasing (or avoiding abuse from) the home crowd?

As you might expect, referee bias in football has been the subject of numerous studies. In a well-known experiment, forty professional referees were asked to watch highlights of the Liverpool vs Leicester match at Anfield in the 1998/99 season. Half watched the game with sound, the other half without. The authors found that the group that watched the game with sound was less likely to call fouls against the home team than the group that watched it without sound. Other studies have taken more empirical approaches, appearing to find evidence for a home-team bias when referees award penalties or issue yellow and red cards (e.g. herehere and here).

Referee bias cannot simply be measured purely in terms of fouls conceded or bookings given; if the home team spends the majority of its games defending against superior opponents, we might expect them to give away more fouls and receive more yellow cards. So to detect referee bias we need to separate it from the stronger impact of team superiority.

In the last blog I introduced a simple way of estimating team superiority – how much better one team is than their opponent – called rolling points difference (RPD). To recap quickly: RPD measures the difference in the number of points the home team and the away team have accumulated over their last 38 games (roughly, over the last year). 

I’m now going to use RPD to separate the impact of home advantage from team superiority and see if we can find any evidence of bias in refereeing decisions. 

Do away teams receive more cards?

While yellow and red cards don’t normally affect the score directly, they can certainly sway a game. Players on a yellow card become more wary of making tackles, and a red card puts a team at a numerical disadvantage. If referees have a propensity to be harsher on the away team it will affect the outcome of the game and could explain home advantage.

I’m going to search for evidence of biased refereeing by comparing fouls committed and yellow/red cards issued. Why? Surely they are highly correlated: a team that commits more fouls will receive more yellow cards. The difference is that a referee has only a fraction of a second to blow the whistle and call a foul; there isn’t much time to be influenced by anything other than what he has seen. Taking disciplinary action is different: the referee has time to decide whether to show the offending player a card or let them off, and this is where they could be influenced – albeit unwittingly – by their surroundings.

In Figure 1 I plot the percentage of all the fouls in a game (as called by the referee) that are committed by the away team (red line), and the percentage of all the yellow and red cards issued by the referee that are received by the away team (blue line). Both are plotted as a function of rolling points difference, separating team superiority from home advantage. The gray region shows the 95% confidence region around the ‘fouls committed’ line (red). To make the plot, I use all EPL results since the 2000/01 season. 

Figure 1: the percentage of fouls committed (red line) and cards received (blue line) by the away team as a function of team superiority (rolling points difference). The grey region shows the 95% confidence region around the fouls committed line.
This figure seems to tell a clear story: away teams appear to be systematically more heavily punished by the referee than the home team.

How does the figure show this?

Let’s start in the middle, where RPD indicates that the teams are evenly matched. The home and away teams commit almost exactly the same number of fouls: around 50% each. However, on average the away team receives significantly more cards than the away team (57% of cards issued to 43%). Put another way, for every 3 yellow cards the home team receives, the away team receives four, even though they committed the same number of fouls.

The effect actually becomes bigger when one team is superior to the other. When the home team is much better than the away team (to the right of the figure), it receives just one card for every two that the away team receives. But when the away team is superior, cards tend to be split evenly. 

Unless the away team is systematically committing more serious fouls than the home team, then this looks like evidence of refereeing bias to me (whether it is conscious or ‘unconscious’ bias is a different matter).  These results agree with those in published in this analysis, which also show referee bias to the home side even when the teams are evenly matched (albeit using a different methodology and data).

So, referees don’t seem to give preferential treatment to the home team when calling fouls (awarding free kicks) but do seem to punish the away team more severely when issuing yellow and red cards. This suggests that, in the time that elapses between the foul being given and a card issued, the referee is – consciously or otherwise – influenced by external factors. 

As I said in my first post on home advantage, I think there is more to home advantage than officiating bias, but the data indicates that it is an important part of the puzzle.