Monday, December 7, 2009

EPL Rankings Update - With New Stat-Based Ranking

The league got a bit more interesting with Man City getting a big win over Chelsea. With their big win over West Ham Manchester United, with a completely decimated backline, are back to just 2 points off. Arsenal are still vaguely in the race, now 8 back with that match in hand. Chelsea are clearly still the favorites to win the league, but at least now we have something to talk about when it comes to the league race.

I've been working on a new rankings and prediction system and I'm ready to unveil it now. Unlike the previous work, this new system uses stats from the match. More specifically, shots, shots on target, fouls, corners and bookings. I'll add time of possession and offside calls as well at some point but I have to do some work recording those. Using those stats and a logit model, I estimate the probability of each possible outcome if the teams played a new full season at the level shown by all the stats. With those probabilities I get the expected number of league points. So the right column is the average number of points the team would get if they played a new season at the level shown by the results and stats shown before.

Rank Club Expected Points
1 Chelsea 87.5
2 Man United 84.3
3 Arsenal 80.3
4 Liverpool 75.9
5 Tottenham 68.2
6 Man City 65.0
7 Aston Villa 53.0
8 Fulham 51.2
9 Everton 47.6
10 Portsmouth 45.4
11 Sunderland 45.2
12 Blackburn 44.4
13 West Ham 43.8
14 Birmingham 42.6
15 Burnley 41.1
16 Wigan 40.6
17 Wolves 37.6
18 Stoke 37.4
19 Bolton 35.4
20 Hull 31.3

Here are the rankings using the previous system that just uses goals for each side.



The biggest difference is Stoke. I was quite surprised to see the Potters so low in my new rankings. Looking at the stats, I can now understand why. They are 18th in the league in shot differential (shots taken minus shots allowed) and last in the league in shots-on-target differential, fouls differential (times fouled minus fouls) and corner differential. I'll get there with the stat series eventually, but for shots there is between a fair and strong correlation between all those stats and goal differential. It looks like Stoke have gotten lucky in terms of how many goals they've scored and allowed when compared to the stats from their matches.

Another club that has a big difference is Portsmouth. I was surprised to discover that they are 5th in shot differential and 6th in shots-on-target differential! They are 15th in fouls, 12th best in corners and 14th in goal differential, which the model also uses. So they appear to be on the other end - they've gotten unlucky when it comes to scoring and also the timing of those goals.

4 comments:

  1. Your model is interesting, but like you said, "if they played a new season at the same level". I was thinking if it would be more accurate and realistic if you are able to calculate the expected points from the remaining matches and add it on to the current points that each club has. I mean, the past is the past, what has happened (good luck/bad luck) has happened.

    That's because, as the season progresses, based on your model, I think they will be a point where the current points that a team has, would already exceed your expected points. Due to variability/luck/etc I guess.

    To match your model, I'm not sure if Liverpool can get 52 points from 23 matches. Likewise, Portsmouth will need to get 35 more points (which I think it's rather difficult), whereas Aston Villa only get 23 more points (another bad run)? And with Stoke already having 20 points from the first 15 games, I don't think they will finish at 37 points as well.

    ReplyDelete
  2. re. the previous comment, it would be interesting to compare the predictions from a fresh season run to the current point with new model with the reality to this point, there should be reasonable alignment given that the data to date has informed the predictions.

    ReplyDelete
  3. Why bother including fouls in your new system when you have found that fouls tell you nothing about which team was better in any given match?

    ReplyDelete
  4. " re. the previous comment, it would be interesting to compare the predictions from a fresh season run to the current point with new model with the reality to this point, there should be reasonable alignment given that the data to date has informed the predictions."

    This is a great suggestion. I'll write a future article where I do this. It's a nice test for the model.

    " Why bother including fouls in your new system when you have found that fouls tell you nothing about which team was better in any given match?"

    Good question. Corners and fouls are similar. There is basically no difference when it comes to how well you do if you get a lot of corners or get fouled a lot. Having said that, good teams tend to get more corners and get fouled more often than they themselves foul over a season. So how many times a team has fouled/been fouled over the season so far is an indicator of how good they are. It is a weak indicator, and has very little weight in the model, but it is something.

    ReplyDelete