EPL Forecasts

(Last updated February 17th, 2017)

Based on the results to date, here are my latest projections for the final 2016/17 EPL table. A description of the simulations used to make these projections can be found below.

Teams are ordered by the expected total number of points they are projected to accumulate this season, measured from 10,000 simulations. The box plots indicate the distribution of each team's points totals over the simulations: the green bars indicate the 25th to 75th percentiles and the dashed lines (‘whiskers’) the 5th to 95th percentiles. For example, in 50% of the simulations Man City finish on between 71 and 81 points and in 90% of the simulations they accumulate between 63 and 89 points. The vertical line in the middle of the green bars shows the median.

The blue dots show the final points total each team would achieve if they continue to accumulate points at the same rate as the current point in the season. For example, after their 14th game Chelsea had 34 points; this equates to 2.4 points/game. At that rate they would have 92 points by the end of the season.

The numbers to the right of the plot show the probability of each team:
     a) winning the title (Ti);
     b) finishing in the champions league spots (CL);
     c) being relegated (red).

The grid below is another way of presenting the predictions. It shows the probability of each team finishing in a given position, with the colour being proportional to the magnitude. For example, I currently predict Chelsea have an 86% chance of winning the league, a 10% chance of
finishing second and a 3% finishing third. Blank squares indicate that there is a less than 1% chance of a team finishing in that position. The red squares indicate each team's current position. Note that the probabilities might not exactly sum to 100% because of rounding.

Here is the same information in table format.

Finally, here are the model predictions for the next round of EPL games.



Here's some technical detail on how these predictions are produced. Further information can also be found in my original blog post on this here.

Season simulations

Each team starts with their current points total. I then work my way through the fixture schedule (currently 260 matches), simulating the outcome of each game. Results are generated based on the Elo rankings of each team – which I update after each simulated match – and the benefits of home advantage (see below for more details). At the end of the ‘season’, I tally up the final points totals for each team.

This process is repeated 10,000 times to evaluate the range of points that each team ends up on; I then make a final league table based on their averages. The probability of each team finishing the season as champions, in the top four or bottom three is calculated based on the frequency at which it occurs within the 10,000 runs.

Specific Model Details

This section takes a look at what is going on under the hood in a bit more detail.

The core of the calculation is the method for simulating match outcomes. For each match, the number of goals scored by a team is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:

There are two predictors in the model: X1 = ΔElo/400, the difference between the team's Elo score and their opponents', and X2 is a binary home/away indictor equal to 1 for the home team and -1 for the away team. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from clubelo.com; after each simulated fixture the Elo scores are updated using the procedure described here.

The beta coefficients are determined via linear regression using all matches for the seasons 2011/12 to 2015/16, obtaining values β1 = 0.26, β2 = 0.71, β3 = 0.13. All are highly significant, as is the change in deviance relative to an intercept-only model. Running the regression on earlier seasons obtains similar results.

No comments:

Post a Comment