Tuesday, December 29, 2009

On Holiday

I am currently on a trip with my family and will probably not be able to post anything until early next week. I may have access in the next few days, in which case I'll probably post on the Spanish league since they come back this weekend and the other leagues have more time off. When I get back home, I'll post a half-time report on each of the top leagues. Next week I hope to get back on track with more general articles about the game. I plan to continue the stats series and am working on analyzing derbies to determine if they are any different than regular matches in terms of predictability and in factors such as home-ground advantage.

Thursday, December 24, 2009

EPL Boxing Day to 2010 Preview

As usual, the last week of the year is filled with English football. This season the fixtures list is as well organized as Germany sitting on a 1-0 lead. Each club plays one match at home and one away. Most clubs play on Boxing Day and then again two days later. There are, however, two matches on the 27th, 29th and 30th so after Christmas you are covered until New Year's Eve.

I'm going to make this a long preview, going through each club starting at the top of the table.

at Birmingham the 26th
vs Fulham the 28th

Chelsea are relatively injury free, they should only be missing Essien and Bosingwa. Essien, who was hurt two weeks ago in a meaningless Champions League match, is a huge loss but their injury situation is not nearly as bad as their two main rivals for the title. Chelsea have two fairly tough matches. While they are favored in each match, they are underdogs to win both of them.

The model says:
Expected points: 4.44
6 points: 45.3% chance
4 points: 27%
3 points: 17.5%
2 points: 3.8%
1 point: 4.9%
0 points: 1.5%

Manchester United
at Hull the 27th
vs Wigan the 30th

It seems like the Red Devils are missing just about everybody. Certainly out for both matches are van der Sar, O'Shea, Evans, Ferdinand and Hargreaves. Nani should be out for both as well. Probably out for the first, and maybe out for the second as well are Vidic, Giggs, Neville and Brown. The good news if you're a United fan is that both matches are against weak opponents. In fact, Hull and Wigan are the worst in the league in goal differential.

The model says:
Expected points: 4.96
6 points: 60.3% chance
4 points: 22.5%
3 points: 12.8%
2 points: 1.8%
1 point: 2%
0 points: 0.6%

vs Aston Villa the 27th
at Portsmouth the 30th

Arsenal are also in bad shape as far as injuries go. van Persie, Rosicky, Gibbs, Clichy and Bendtner will be out for both matches. Traore and most importantly Fabregas are questionable for the first match against Villa. That match should be a great one that is big for both sides. I think the Portsmouth match will be tougher than it appears as well. Arsenal should be ok and playing later than most teams should help them slightly since it give Cesc an extra day to heal. Plus they get two days off in between instead of the more common one.

The model says:
4.14 expected points
6 points: 37.5% chance
4 points: 28.0%
3 points: 20.3%
2 points: 4.9%
1 point: 6.9%
0 points: 2.4%

Aston Villa
at Arsenal the 27th
vs Liverpool the 29th

If you are a neutral and have to watch just one club this week it should probably be Villa as they play what look like the two best fixtures. Heskey went down with a groin strain last weekend and is questionable for both matches, otherwise they are healthy. That's good news as they'll need to put full effort into both matches against potential rivals for European spots. Their second match may be extra tough since they only have one day off in between whilst Liverpool will have two.

Expected points: 1.84
6 points: 4.0%
4 points: 9.9%
3 points: 27.1%
2 points: 5.7%
1 point: 27.3%
0 points: 26%

Tottenham Hotspur
at Fulham the 26th
vs West Ham the 28th

Two fairly tough matches await Spurs who are hoping to take advantage of recent form. Nothing new on the injury front. They will still be without Woodgate as well as the out-of-favo(u)r Bentley. Luka Modric, returning from breaking his leg last August, came on as a sub on the 12th against Wolves but did not play against Man City. He may not start both matches but it seems like he should play in one or both matches. Given Aston Villa's two matches, this would be a great opportunity for Spurs to slide into the fourth Champions League spot, at least temporarily.

Expected points: 3.72
6 points: 25.3%
4 points: 27.3%
3 points: 31.5%
2 points: 4.4%
1 point: 8.3%
0 points: 3.2%

Man City
vs Stoke City on the 26th
at Wolverhampton Wanderers the 28th
Expected Points: 3.98

vs Chelsea on the 26th
at Stoke City on the 28th
Expected Points: 1.91

vs Wolves the 26th
at Aston Villa the 29th
Expected Points: 3.88

vs Tottenham the 26th
at Chelsea the 28th
Expected Points: 1.9

vs Everton the 26th
at Blackburn the 28th
Expected Points: 2.65

vs Bolton the 26th
at Everton the 28th
Expected Points: 2.6

at Sunderland the 26th
vs Burnley the 28th
Expected Points: 3.09

vs Blackburn the 26th
at Manchester United the 30th
Expected Points: 1.94

vs Manchester United on the 27th
at Bolton the 29th
Expected Points: 1.28

at Burnley the 26th
vs Hull City the 29th
Expected Points: 2.91

West Ham
vs Portsmouth the 26th
at Tottenham the 28th
Expected Points: 1.98

at West Ham the 26th
vs Arsenal the 30th
Expected Points: 1.99

Wednesday, December 23, 2009

EPL Pre Boxing Day Rankings Update

In a week that featured a midweek matchday, there were a lot of interesting results. Most surprising was probably Fulham getting a 3-0 home win against Manchester United. Fulham getting a win over Man U at Craven Cottage isn't too surprising by itself but the scoreline is. Granted, United have had a lot of injury issues lately, and the score was unfair but the result is still eye-opening. The other big upset I suppose was Liverpool losing 2-0 to Portsmouth. It's similar to Fulham - Man United in that it's not the most amazing result ever but still unexpected. Liverpool have been hemorrhaging points lately, and in my view Portsmouth aren't as bad as their points or status at the bottom of the table indicate.

The big winners of the week were probably Spurs and Aston Villa. While all the other clubs near the top faltered, both got the job done. Spurs ran through Manchester City in the midweek with an impressive 3-0 win. Having watched this one I think the scoreline was fair; Tottenham dominated for most of the match. They wrapped things up with a 0-2 win at Blackburn. Villa's two wins weren't particular impressive, 0-2 at Sunderland and 1-0 against Stoke City but they count all the same and the other teams around them, Spurs excepted, all slipped up.

Here are the updated rankings. I'm including more info on the first chart. These are the rankings based only on the number of goals in each match:

EGF - expected goals for
EGA - expected goals against
EGD - expected goal differential
Change - change in expected goal differential from last rankings (14 Dec)
expected goals for/against are the average number of goals scored/conceded by the club if they all played a new season at the level shown so far by all results.

Here is the rankings based on stats as well as results:

EP is the expected number of league points if they played a new season at the level shown so far. Change is the change in expected points since I posted the last rankings on 14 December.

One of the nice features of adding stats is that because there is more information they are less prone to move based on one or two good or bad matches. You can see that above, though keep in mind that a difference of 1 goal in goal differential is a change of about 2/3 of a point. Even taking that into account the average difference is about 3 times as large in the goals-only model compared to the one that also uses match stats.

Tuesday, December 22, 2009

Effect of Boxing Day Glut of Matches

The English Premier League is unique among top flights in that they play a lot of matches between Christmas and the first few days of the year while most leagues are completely off. Most seasons a team will play on Boxing Day and then again just two days later. Some seasons there will be another match just 2 or 3 days after that so they are playing 3 league matches in 6 or 7 days. This year there are matches every day from the 26th to the 30th of December. Most teams play on the 26th and then on the 28th.

Does this sequence of fixtures help any type of team? It seems like it could go either way. The good teams tend to be deeper in talent so the short amount of rest could help them as they can start fresh players that are still at a high level. On the other hand, fatigue could add some randomness to the short-rest matches. That is probably good for the bad teams since it makes upsets more likely. Perhaps the most sensible guess is that it doesn't really matter; both teams face the same strain.

To test this, I came up with a simple model with just one input for all matches and another for the matches played on short rest during the last week of the year or the first few days of the new year depending on the schedule. The data is all matches since the 1995-1996 season. The input is the difference in average goal differential for the home and away teams in all matches other than the one in question. So instead of the predictive model, I'm actually using data from after a given match took place. For example, if the match is Arsenal at home against Everton two seasons ago I'll take Arsenal's goal differential for the season, subtract off their goal differential in that match, and divide by 37. I then do the same for Everton and the input value is the difference. I then include another variable which takes on 0 for all matches other than those with short rest at the end or beginning of the calendar year. For the matches we are interested in, the value is the same as the other - the difference in average goal differential between the home and away team in all other matches. I then ran those through an ordered logit model to see if there was a difference between the short-rest matches just after boxing day and the regular ones.

In case that didn't make sense, the only thing you need to know is that if the coefficient on the short-rest variable is significant and positive, then that means that the shortened rest favors good teams. If it is significantly negative then that means that the schedule favors bad teams making them more likely to get a result against better sides. If it is very close to zero either way then that indicates that the difference in schedule from the rest of the season doesn't matter either way.

As it turns out, there is no evidence of an effect either way. The coefficient was 0.12 with a standard error of 0.167. The standard error being larger than the value of the coefficient means it is very likely that the difference is just due to randomness. The p-value is 0.472, meaning that there is about a 47% chance of values this extreme if the actual value of the coefficient is 0. That's quite high. There is no evidence of any difference between the post-Christmas group of matches and the rest of the season.

There are reasons to dislike the scheduling, I'm sure players don't like playing so frequently in such a short period of time, but it appears that fairness isn't a factor as it doesn't overly benefit or punish good or bad teams.

Friday, December 18, 2009

EPL Predictions

To be honest there aren't any big matches this weekend so I'll just give you the predictions of the stat-based model.

Portsmouth - Liverpool
Portsmouth - 21%
Liverpool - 53%
Draw - 26%

Aston Villa - Stoke City
Aston Villa - 63%
Stoke - 15%
Draw - 22%

Blackburn - Tottenham
Blackburn - 25%
Tottenham - 47%
Draw - 28%

Fulham - Manchester United
Fulham - 22%
Man United - 51%
Draw - 27%

Manchester City - Sunderland
Man City - 67%
Sunderland - 13%
Draw - 20%

Arsenal - Hull
Arsenal - 90%
Hull - 3%
Draw - 7%

Wolverhampton - Burnley
Wolves - 42%
Burnley - 29%
Draw - 29%

Everton - Birmingham
Everton - 53%
Birmingham - 21%
Draw - 26%

West Ham - Chelsea
West Ham - 9%
Chelsea - 75%
Draw - 16%

Monday, December 14, 2009

Bundesliga Update

It's been a while since I wrote about the Bundesliga. In my last post on the subject, I noted that Leverkusen were at the top of the table and had played a very easy schedule. Based on that, I predicted that they would extend their lead which was then at 3 points. That hasn't happened. After drawing in Munich and crushing Stuttgart at home they have two disappointing draws - at Hannover and then last Friday against Hertha Berlin. I still believe that they are the best team, but the gap is small and I could certainly see them falling.

On the other side, Bayern Munich seem to have found a rhythm. They won their last three matches, albeit against bottom half of the table opponents, by a combined 8 goals. They also destroyed Juventus in Torino in a match that was crucial for both sides. With the draws by Leverkusen, Bayern find themselves just two points out. That's a far cry from where they were last I wrote. I'm not ready to make them favorites but I definitely think they have a better shot now than I did before.

Here are the results-based rankings:

GFR - Ranking by goals for, adjusted for schedule
GAR - Ranking by goals against, adjusted for schedule
EGD - expected goal difference if they played a new season starting today at the level shown by results.

Here are the rankings using the model that incorporates stats:

Epts is the expected points if they played a new season starting today at the level shown by results and stats from all matches thus far.

Both models have Leverkusen top, as does the league table. Near the top, Werder Bremen are two spots higher when stats are taken into account. Bayern and Hamburg go the other way. The biggest movers overall were Mainz, who drop 5 places, and Stuttgart, who moved up 4 spots. While the stats model is still pretty new and I'm working on it, as a general rule the teams that are higher in that model than the goals-only one have usually gotten unlucky and run below expectation in goals given how their matches have gone. On the flip side, teams better in the goals-only ranking have been lucky and scored more goals than expected. I believe that the stats-based model is better when it comes to being predictive.

La Liga Rankings

Real Madrid picked up a big win at the Mestalla, otherwise it was as expected near the top. The biggest story is probably Pepe being out for the rest of the year for Real Madrid.

Here are the rankings using the system that only takes goals into account:

GFR - Goals for rank
GAR - Goals against rank
EGD - expected goals if they played a new season at the level shown by the scorelines thus far.

Here are the rankings going by the system that uses stats:

EP is the number of expected points if they played a new season at the level shown so far by the scores and stats from all the matches thus far. The big positive changes from the two rankings systems are Getafe and Zaragoza. Getafe are 4 spots higher in the system that uses stats and Zaragoza 6. Going the other way, when I included stats that dropped Sporting de Gijon 5 spots and Osasuna, Racing de Santander and Almeria all 4 spots. I'm still looking into the new stats model, but that's likely an indicator that Getafe and Zaragoza have been unlucky and run below expectation while Sporting, Osasuna, Racing and Almeria have run above expectation.

EPL Midweek Predictions

Here are the predictions for the midweek. There is a not unusual mix of some matches that should be close and others that should be less than competitive. Last weekend was full of surprises, we'll see what this week brings:

Sunderland - Aston Villa
Sunderland: 36%
Aston Villa: 34%
Draw: 29%

Birmingham - Blackburn
Birmingham: 43%
Blackburn: 29%
Draw: 29%

Bolton - West Ham
Bolton: 38%
West Ham: 33%
Draw: 29%

Manchester United - Wolverhampton
Man United: 83%
Wolves: 6%
Draw: 11%

Burnley - Arsenal
Burnley: 12%
Arsenal: 69%
Draw: 19%

Chelsea - Portsmouth
Chelsea: 83%
Portsmouth: 6%
Draw: 11%

Liverpool - Wigan
Liverpool: 81%
Wigan: 7%
Draw: 13%

Tottenham - Manchester City
Spurs: 48%
Wigan: 24%
Draw: 24%

Updated EPL Rankings

What a crazy week in the EPL. Every top team other than Arsenal disappointed. Arsenal are now back in the race as they will find themselves down just 3 if they can win their match in hand. Manchester United fans can look at it either way as they lost at home to a pretty good side in Aston Villa but only lost a point as Everton managed to get a draw at Stamford Bridge. Just when it was looking like a runaway at the top, it seems now like it could be pretty open.

Here is the ranking using the system that only takes goals into account:

GD is the expected goal differential if they played a new season at the level shown by the number of goals scored so far. GF and GA are the ranking of the team in that category.

Here is the ranking according to the new system that takes both goals and several stats into account:

Epts is the expected number of league points they would get if they played a new season at the level shown so far by the number of goals for and against as well as the stats from the matches. The last column is the change from last week.

I remain baffled by Stoke City. To be honest, I am hoping that they don't slide and stay up because they are an interesting case. I am going to write an article just about them because they did the same last season. This week was no exception with them putting both of their shots on target into the goal.

Stay tuned for a preview of matches taking place tomorrow and Wednesday.

Friday, December 11, 2009

EPL Weekend Preview

This weekend has some pretty interesting matches. The highlight is Sunday's matchup between Liverpool and Arsenal at Anfield. Those teams both had title hopes at the beginning of the season. Liverpool would need an incredibly run and for several teams to fade to claw their way back into it and Arsenal in not quite the same situation but well out of it at the moment, especially with such a brutal injury situation.

Here are the model predictions for each match:

Stoke - Wigan
Stoke: 40%
Wigan: 31%
Draw: 29%

Birmingham - West Ham
Birmingham: 43%
West Ham: 28%

Bolton - Man City
Bolton: 19%
Man City: 56%
Draw: 25%

Burnley - Fulham
Burnley: 38%
Fulham: 33%
Draw: 29%

Chelsea - Everton
Chelsea: 85%
Everton: 5%
Draw: 10%

Hull - Blackburn
Hull: 34%
Blackburn: 37%
Draw: 29%

Sunderland - Portsmouth
Sunderland: 46%
Portsmouth: 26%
Draw: 28%

Tottenham - Wolverhampton
Tottenham: 79%
Wolverhampton: 7%
Draw: 14%

Manchester United - Aston Villa
United: 76%
Villa: 9%
Draw: 15%

Liverpool - Arsenal
Liverpool: 43%
Arsenal: 29%
Draw: 28%

Monday, December 7, 2009

EPL Rankings Update - With New Stat-Based Ranking

The league got a bit more interesting with Man City getting a big win over Chelsea. With their big win over West Ham Manchester United, with a completely decimated backline, are back to just 2 points off. Arsenal are still vaguely in the race, now 8 back with that match in hand. Chelsea are clearly still the favorites to win the league, but at least now we have something to talk about when it comes to the league race.

I've been working on a new rankings and prediction system and I'm ready to unveil it now. Unlike the previous work, this new system uses stats from the match. More specifically, shots, shots on target, fouls, corners and bookings. I'll add time of possession and offside calls as well at some point but I have to do some work recording those. Using those stats and a logit model, I estimate the probability of each possible outcome if the teams played a new full season at the level shown by all the stats. With those probabilities I get the expected number of league points. So the right column is the average number of points the team would get if they played a new season at the level shown by the results and stats shown before.

Rank Club Expected Points
1 Chelsea 87.5
2 Man United 84.3
3 Arsenal 80.3
4 Liverpool 75.9
5 Tottenham 68.2
6 Man City 65.0
7 Aston Villa 53.0
8 Fulham 51.2
9 Everton 47.6
10 Portsmouth 45.4
11 Sunderland 45.2
12 Blackburn 44.4
13 West Ham 43.8
14 Birmingham 42.6
15 Burnley 41.1
16 Wigan 40.6
17 Wolves 37.6
18 Stoke 37.4
19 Bolton 35.4
20 Hull 31.3

Here are the rankings using the previous system that just uses goals for each side.

The biggest difference is Stoke. I was quite surprised to see the Potters so low in my new rankings. Looking at the stats, I can now understand why. They are 18th in the league in shot differential (shots taken minus shots allowed) and last in the league in shots-on-target differential, fouls differential (times fouled minus fouls) and corner differential. I'll get there with the stat series eventually, but for shots there is between a fair and strong correlation between all those stats and goal differential. It looks like Stoke have gotten lucky in terms of how many goals they've scored and allowed when compared to the stats from their matches.

Another club that has a big difference is Portsmouth. I was surprised to discover that they are 5th in shot differential and 6th in shots-on-target differential! They are 15th in fouls, 12th best in corners and 14th in goal differential, which the model also uses. So they appear to be on the other end - they've gotten unlucky when it comes to scoring and also the timing of those goals.

Friday, December 4, 2009

World Cup Draw

I will be writing a lot more on the World Cup over time, but here are my initial thoughts on the draw.


United States - It was close to a dream draw for the yanks. They missed out on South Africa, but England is one of the easier seeded teams to draw. The other two teams are the worst teams in their pots. There weren't many possible draws that have the US as second-best in the group, but they got one. Qualifying still won't be easy, but it's a lot more likely now than it was yesterday.

England - Similar story to the above. Depending on your point of view, the USA was the best or second best behind Mexico in pot 2 so that was unlucky, but again the other two teams were the best draws. England should go through top.

France - Whether it was due to the handball incident or something else, France weren't seeded because FIFA changed the way they did the seeding. With the draw they effectively became the seeded team in the group. Drawing South Africa not only meant the weakest seeded team, but it also meant not having to play one of the big African teams. To make things even better, should France advance they will play one of Argentina, Nigeria, South Korea and Greece. Those teams aren't bad, but as I'll point out later, it could have been a lot worse. The only thing not perfect is that they drew one of the two big countries from pot 2 with Mexico.

Italy - The defending champs had by far the best draw. With New Zealand and Slovakia they got countries that were certainly in the bottom two in their pot. Paraguay are tougher to rate, but I think most everyone would put them in the bottom half of their pot and some second worst. Their draw for the round of 16 is favorable as well as they are very likely to get the best of Denmark, Japan and Cameroon.


Spain - Others are likely to say that Spain had a good to great draw. I think they were one of most unlucky countries. The reason for that is that from most any group they would be significant favorites to advance, even to advance first. The group draw isn't very relevant for the kings of Europe. The knockout draw will lead to what should be the best round of 16 match ever for the neutrals but it is absolutely brutal. Spain will play one of Brazil, Côte d'Ivoire and Portugal in the round of 16. That is astounding. The two runaway favorites in the tournament, at least before the draw, were Brazil and Spain. Portugal and especially the Côte d'Ivoire are both considered to be in the second tier of teams that definitely have a shot. There is more than a 99% chance that Spain will play one of those teams. Looking forward to the round of 8 is tougher, but if they can get out of that their reward is likely to be Italy, who matchup well with them. En route to winning Euro 2008, Spain had to best Italy on penalties after a neither team scored a goal in 120 minutes.

Brazil, Côte d'Ivoire, Portugal - Not a lot to say here. Absolutely brutal for these countries. Not only is one of the best teams in the competition going to be eliminated in the group stage, but one of the advancing teams, probably the one finishing second, has to face Spain in the first knockout round.

Mixed Bag:

The Netherlands - It's tough to argue that the Dutch group draw was great. Denmark and Japan are both pretty good teams that can give teams problems. Cameroon are a very talented side with the much discussed advantage of playing on their home continent. That's a tougher than average group. Having said that, the Dutch should finish top of the group as they are pretty easily the best side. Their reward in the round of 16 if they do that is most likely Paraguay or Slovakia, so that part is great. On the other hand, their round of 8 match is most likely whoever wins the group with Brazil, Côte d'Ivoire and Portugal, with Brazil most likely. There aren't usually a lot of easy wins in the last 8 but it could obviously be a lot better than that. So some good and bad news.

Germany - I thought of putting Germany in the winners category, but decided to put them here instead. Australia, Serbia and Ghana is a tough draw as all three were among the best teams in their pot. Germany still should get out of that in first. If they do, they will likely play the United States, Slovenia or Algeria. Any of those would be below average for the round of 16. Looking further ahead, their round of 8 match is against the best of Argentina, Nigeria, South Korea and Greece or the second best of France, Mexico, Uruguay and South Africa. So barring a big upset, if they can advance first from the group stage Germany will have one of the easier round of 16 opponents and the easiest in the quarterfinal. Even if you take into account their pretty tough group, for me the group of death because all four teams are good, no other team has as clear a path to the semifinal.

Thursday, December 3, 2009

Stat Series: Fouls

I'll now continue the stat series by looking at fouls. Like corners and time of possession, about which I'll write in the future, fouls are often used as an indicator that one team was a lot better than the other in a match. A defender usually fouls a player with the ball because he cannot otherwise make a tackle. It is often because the attacker is making a run past the defender or receiving the ball in a dangerous area. Other than a few exceptions like fouling the keeper on a corner, fouls tend to be called on you when your team is getting outplayed, at least for the couple seconds before they occur. But are fouls really a strong indicator of performance in a match?

If that is the case then we should see a link between the number of fouls committed by each team and the match result. If getting fouled a lot more often than fouling your opponents indicates that you are playing better, and it's a given that playing better leads to winning, then getting fouled more often should be well correlated with getting results.

Looking at the data, I was quite surprised to find that there is practically no connection between the number of fouls called for each team and the results of matches. In other words, foul statistics seem to not indicate anything at all in terms of which team was better on a given day. My data set is all matches from the previous two seasons in the English, Spanish, Italian, German and French top flight. I may expand the dataset for this study but frankly I didn't see the point after seeing what the results were for the last two years.

I looked at it a couple different ways. The first was just to look at the correlation between foul differential (home fouls minus away fouls) and goal differential (home goals minus away goals). A positive correlation would indicate that fouling more often than your opponent tends to lead to outscoring them. A negative correlation would be more in line with what you would expect; if you get fouled more often than your opponent you will tend to outscore them. If the correlation coefficient, which varies between -1 and 1, is very close to zero then that indicates that there is little relationship between the two. As it turned out that was the case. The correlation coefficient was -0.01278. In other words, the data indicates that there is essentially no link between number of fouls committed by each team and the goal difference in a match.

To look into this further, I broke the matches up into three groups: those where the home team committed more fouls than the away team, those where the away team fouled more and those where the two teams committed the same number of fouls. In the sample there were 1,582 matches where the home side committed more fouls. In those, the home team won 729 times, got a draw in 415 and lost 438. When the home team committed more fouls they averaged 1.64 points per match. The away team fouled more in 1824 matches. In those, the home team went 853-484-487 for an average of 1.66 points per match. There were 246 matches where the two teams were called for the same number of fouls. In those the home side went 122-55-69 for 1.71 points per match.

For the fellow nerds out there, there is no statistical significance when comparing 1.66 points per match and 1.64. More importantly though, there is no actual significance! "Winning the fouls battle" was only worth 2 hundredths of a league point. That is absolutely nothing. If you gave that up over an entire season, it would still cost you less than one point on average: 0.76 for a 20-team league. In a previous article on goal differential and points, I showed that there is a very strong link between goal differential over a season and points. Using the formula from that article, giving up .02 points per match every match for the entire season is about the equivalent of conceding one more goal overall. The stats indicate that the difference in points from getting fouled more or fewer times in a match almost certainly is due to random noise and not a real effect. My point is that even if it were due to something real, the difference is so small that it isn't relevant.

To put the final nail in the coffin, I looked only at matches where one team or the other committed a lot more fouls. In those where the away team committed 5 or more additional fouls than the home team, the home side went 442-273-260 for an average of 1.64 points. When the home side committed 5+ fouls more than the away team they went 330-196-208 for an average of 1.62 points per match. So the difference is the same as in the other case. I find it curious, though it doesn't matter at all, that the home teams did better in matches where the number of fouls committed by the two teams were close than either extreme. So even in the extreme case, it's extremely likely that the difference is just due to noise and even if it isn't the difference is so small that it doesn't matter.

There is no evidence whatsoever of a connection between the number of fouls committed by each team in a match and the result. If in the future you see me using the difference in fouls to make the case that one team was better than the other despite the result, please yell at me in the comments.

In the future I will look at full-season information for different teams as well as different leagues. It is possible that there is a link between fouls and results in individual matches, but it is restricted to certain leagues due to style of play or officiating.

Wednesday, December 2, 2009

FIFA Announces Pots for Draw

FIFA has announced the pots for the World Cup draw, with one major surprise. Instead of using the same type of formula they used last time, which I used in a past article to calculate the 7 seeded teams along with South Africa, they just used the FIFA rankings from last month. This meant that France dropped out and the Netherlands became seeded. That's a pretty big break from tradition as they've used roughly the same seeding formula for the last handful of World Cups.

Here are the pots:

South Africa

Pot 2:
North Korea
South Korea
New Zealand
United States

Pot 3:
Côte d’Ivoir

Pot 4:

Here are a few thoughts:
- The second pot contains no teams from the same confederation as the seeded pot. This means that for those teams, each of the seeded teams are equally likely.

- If you are like most people and think that South Africa is by far the easiest draw from the seeded pot then the unseeded South American sides have an edge because each has a 1 in 3 chance of playing in South Africa's group. This point and the last one are because no two teams from the same confederation other than UEFA can be put into the same group.

- If you're French or a fan you obviously aren't happy. If you are Dutch then I'm sure you are. For the rest of the world, the only change as far as the seeding goes is that France replaced the Netherlands as the big team you'd like to avoid. In my view, and probably that of most others, France, Portugal and the Ivory Coast are clearly the best three teams that aren't seeded. I think it's a lock that one of those three will be in what is considered the group of death.

- The United States and Mexico are also very likely to be in the group of death. Keep in mind that this doesn't mean the draw will be tougher. The US and Honduras have the exact same probability of drawing any given set of opponents, but because the US and Mexico are the best two teams in their pot they are likely to be in the group of death.

- Perhaps because of South Africa or maybe just some kind of recency bias, it seems like there is more on the line in terms of how tough the groups could be. A possible group is Brazil, the United States, Ivory Coast and France. Another is South Africa, North Korea, Uruguay and Slovenia. For countries like the US, Denmark, Chile and Ghana there are possible groups where it would be very unlikely to get out, maybe a 10% chance, and others where it would be a near certainty.

Tuesday, December 1, 2009

Spanish League Post-Clasico Update

Barcelona went back on top by virtue of their 1-0 win over Real Madrid. To be honest, Madrid far exceeded my expectations in the match. They actually outplayed Barcelona for long stretches, especially in the first half and Barça were fortunate in that Zlatan put away their only clear chance of the match. I expected it to be like last year when Barça played Real Madrid off the park both times. Elsewhere, Sevilla got a disappointing result with a draw versus their Andalusia rival Malaga. The result didn't feel particularly fair as Sevilla dominated, including the first half even though the score didn't indicate that; they went into the break down 0-2. Valencia also failed to win at home, picking up just a draw against fellow Catalan speakers Mallorca. Atletico de Madrid perhaps are getting it back as they crushed Espanyol 4-0.

Here are the rankings:

The last column is the expected goal differential if the teams played a new season at the level the results so far indicate. O and D are offensive and defensive ranks which are based on how many goals scored and conceded as well as the opponent strength.

Barcelona remain best at both ends of the pitch. Sevilla slipped a bit due to giving up two goals at home to Malaga, who are pretty weak when it comes to scoring.

Monday, November 30, 2009

EPL Update - Chelsea Clear Top

With the big win over Arsenal at Emirates, Chelsea cemented their place at the top. The blues remain 5 points clear of Manchester United and are now 11 points ahead of the Gunners, with Arsenal having a match in hand. Looking ahead to December, Chelsea have Man City next and then a pretty comfortable list of fixtures: Everton, Portsmouth, West Ham, Birmingham and Fulham. They have a great shot at being in a dominant position come the break. They will have to be good though since Manchester United has a pretty similarly easy schedule for that stretch. I thought it would be different but at this point it's tough to see anybody but those two winning it, with Chelsea the favorite at the moment.

Here's the updated rankings:

Again, the last column is the expected goal differential if these teams played out a full season at the level shown so far.

Someone asked me when I expect these numbers to start to converge. I don't have a good answer to that and it's something I'll be looking into. They will certainly converge at the end of the season which is a nice feature of the model. Most teams have played 14 matches, which by the way is right around the number of matches for a World Cup qualifying campaign. I think things should be pretty good, but because right now the model only uses goals and no other statistics, it is still subject to randomness.

When I was checking on the numbers, I discovered something interesting. I've said before that teams like Arsenal and even Chelsea are overrated when it comes to scoring as they've likely been running over expectation. The reason for that is that the model predicts an expected goals of 108 for Arsenal and 97 for Chelsea when both clubs scored 68 goals last season with similar rosters. The model accounts for schedule so it's a bit different, but both of those clubs are on pace for about as many goals as the model says. When I compared other clubs' expected goals with last season, I discovered that 14 of the 17 teams that played last season in the EPL have a higher expected goals scored.

I thought this might be a problem with the model, but for all matches thus far the average number of combined goals is 3.04. Last season it was 2.21. If scoring stays at this pace, there will be about 318 more goals scored this year than last year in the entire league. I have no explanation for why this would be the case, but scoring seems to be up in a big way. I'm going to write another article or two for the stats series but this raised another interesting question which I'll write about. I wonder if scoring is often higher in matches in the early part of the season compared to later.

Saturday, November 28, 2009

EPL Mini Preview

Sorry for the lack of posts this week. I've been visiting family for Thanksgiving, a major holiday in the US. So as not to leave you completely hanging, here are the model predictions for the EPL this week. I should have a more-detailed preview for Arsenal - Chelsea and the Clasico tomorrow. I know it's late but perhaps you can read it the morning of the matches to get in the mood.


Blackburn - Stoke
Blackburn: 43.5%
Stoke: 27.4%
Draw: 29.1%

Fulham - Bolton
Fulham: 65.9%
Bolton: 11.7%
Draw: 22.4%

Man City - Hull City
Manchester: 75.6%
Hull: 7.1%
Draw: 17.3%

Portsmouth - Manchester United
Portsmouth: 22.5%
Man United: 47.9%

West Ham - Burnley
West Ham: 56.5%
Burnley: 18.2%
Draw: 25.2%

Wigan - Sunderland
Wigan: 29.8%
Sunderland: 42.1%
Draw: 28.1%

Aston Villa - Tottenham
Aston Villa: 39.8%
Tottenham: 32.3%
Draw: 27.9%


Wolves - Birmingham
Wolves: 36.9%
Birmingham: 31.9%
Draw: 31.1%

Everton - Liverpool
Everton: 26.1%
Liverpool: 47.5%
Draw: 26.4%

Arsenal - Chelsea
Arsenal: 49.1%
Chelsea: 25.7%
Draw: 25.2%

Monday, November 23, 2009

Liga Update

This week is obviously a big week for the Spanish League with the Clásico coming up on Sunday. The two biggest stories there are that Barcelona were held to a draw, putting Real Madrid on top going into their match, and that Cristiano Ronaldo is expected to play in the Champions League in the midweek and then in the Clásico. Other than Athletic de Bilbao holding Barcelona to a 1-1 draw, there weren't any surprises.

Here's the rankings table:

Premier League Rankings Update

Tottenham threw a nice wrench into things with a 9-goal win. That's unusual enough that I suspect my models will overrate Spurs by a bit for the rest of the year. That will be reduced over time as the total number of matches goes up but I don't think it will go away entirely. After getting just a draw, Liverpool have just about completely left the title race. I thought the big four would all be in it 6 weeks from now so I'll just call that prediction dead. With Arsenal hosting Chelsea this weekend, it could go anywhere from a wide-open three-club race to Chelsea in great position well clear.

Thursday, November 19, 2009

A(nother) Request for Feedback and Suggestions

This blog has gotten a decent number of new readers so I thought I'd once again ask for feedback and suggestions.

Do you like the weekly rankings articles? Hate them? Want me to write something on a league I haven't yet?

How about the more general articles like the stats series? The weekly previews?

Any feedback, positive or negative would be appreciated. Please leave a comment below.

Thoughts on the World Cup Draw

Finally, and in typically controversial fashion, we have our 32 teams. FIFA have yet to announce what the pots will be for the draw taking place December 4th. Last time around they announced them on December 6th and had the draw December 9th so I don't think we can expect that for a fortnight or so.

I have written before that I believe that the seeded teams will be Brazil, Italy, Spain, England, Germany, Argentina, France and South Africa. This is based on the formula that they used last time. They may change the formula, they threw a wrench into the UEFA playoffs by making the draw seeded shortly before it happened, but my guess is that those will be the seeded teams.

That leaves these teams:


Ivory Coast

North Korea
South Korea

United States

New Zealand


Unless they completely blow up the format they've used for the last several World Cups, two pots will be the seeded teams and the non-seeded UEFA countries. For the other two, they have a few different ways to go and they've done them all before. They could go:
Pot 3: Africa and North America
Pot 4: Asia, South America and New Zealand

Pot 3: Asia, North America and New Zealand
Pot 4: Africa and South America

Pot 3: North America, South America, New Zealand
Pot 4: Africa and Asia

That last option creates uneven pots, but that can be dealt with in the draw and has happened before. In my view, the toughest group of countries by far when you take out the seeded teams and Europe, both of which have their own pots, is those from Africa. I think that's often true anyway, but even more so with the finals taking place in South Africa. So if you are a fan of one of the above countries then you want whatever configuration puts Africa in your country's pot. It may be possible to draw a team from your pot because of the rule that two countries from the same confederation can't be in the same group, but you are more likely to draw one from the other.

Getting back to the seeded teams, I personally put them into 4 groups. That's a big number since there are 8 of them but here's how I rate them:

Favorites: Brazil and Spain
2nd Tier: Germany, Italy, England
Your Guess Is as Good as Mine: France, Argentina
Please: South Africa

I have them sorted best to worst within. The first two groups are pretty self explanatory. For the third group, I think both of those countries are in an interesting situation where they have the talent to be up there, but due to bad management or something else they are playing far enough below their potential that they are not very good right now. The World Cup is still over 200 days away (and Hiddink is available!) so a lot can happen. If France and Argentina played to their potential I would put them up there with Germany, perhaps those three in the 2nd tier and drop England and Italy to their own. I may get heat from English supporters and possibly be underrating them, but I feel they lack the quality and especially the depth of the six other big teams that are likely to be seeded. I would put them ahead of France and Argentina right now, obviously. You will hear shouts of joy from the unseeded countries if they get drawn with South Africa who challenge the United States in '94 for worst host ever.

Looking at the unseeded teams, the countries to avoid are the Netherlands, Portugal and Ivory Coast. I think that these teams have by far the best chance of the unseeded teams to win it, as good or better a shot than England. Other than that, the teams to avoid depend on what the pots are. Along with those three, Mexico, the United States and South Korea are all likely teams to be in the group of death simply because they are likely to be better than the other teams in their pot. Again this depends on the format of the draw; it's especially true if the North American and/or Asian teams are put into a different pot as the African teams. Keep in mind that this doesn't mean they'll get a tougher draw. For example, Honduras is quite unlikely to be in the group of death because they aren't very good. However, the US and Mexico have the same chance of drawing any given 3 opponents as Honduras.

Tuesday, November 17, 2009

UEFA Playoff Predictions

I'm working on a couple other things so I'll keep this short. Here are the predictions for the playoffs tomorrow. These are probabilities of advancing according to my models. As I said before, using logit is less sensitive to quality of the two teams. It seems too much so but I'm not sure so I'm giving both the logit and poisson versions of the averages model. I'll list the logit version first and the Poisson second.

France - Ireland:
France - 83.3%, 92.2%
Ireland - 16.7%, 7.8%

Bosnia-Herzegovina - Portugal:
BH - 22.6%, 17.9%
Portugal - 77.4%, 82.1%

Ukraine - Greece:
Ukraine - 49.2%, 48.2%
Greece - 50.8%, 51.8%

Slovenia - Russia:
Slovenia - 30.8%, 26.2%
Russia - 69.2%, 73.8%

Thursday, November 12, 2009

WQC UEFA Playoff Preview: Ireland - France

Will Irish eyes be smiling? Will the French come out of next week with their joie de vivre? Is that a cheesy enough intro? No? In an epic struggle between Guinness and champagne, eleven footballers from each these two great republics will battle it out to see who books a trip to South Africa and who weeps in front of the television next summer. Will France be able to overcome the luck o' the Irish or will Ireland best France despite them having a certain je ne sais quoi? That should do.

How they did in the group stage.

Making the playoff always means the same thing. You did pretty well in the group stage, but not well enough. Unlike Ireland, France were expected to win their group. They went 6-3-1 with a goal differential of +9. The other teams in their group were Serbia, Austria, Lithuania, Romania and the Faroe Islands. I'd say that's an average to a bit above average group. Serbia bested them despite France getting a win and a draw against the Serbs. The French gave away points with an early loss at Austria and two draws against Romania, the last of which was two months ago in Saint-Denis.

Looking at results, Ireland are probably the most interesting team remaining. They are the only country in the playoffs that didn't lose in the group stage; their problem is that they got far too many draws. Their group mates were Italy, Bulgaria, Cyprus, Montenegro and Georgia. Again I'd rate that as a pretty average group. Playing Italy to two draws is certainly nothing to be ashamed of. Less impressive are the two 0-0 draws against Montenegro and 1-1 draws against Bulgaria. In terms of goal differential, Ireland weren't impressive. Their four wins over lowly Georgia and Cyprus were all by a single goal. I don't recall another team coming out of a group stage like that with every match either a draw or just with one goal in it. Based on that we can expect a couple of close matches.

Likely Lineups

France have some big injuries. Franck Ribery will be out with a knee injury. Jeremy Toulalan is also expected to miss out on one or both of the matches with a tweaked adductor muscle. In addition, Abou Diaby will probably be ready but has had injury issues. On the other side, Ireland have few injuries. They will be without backup forwards Shane Long, Noel Hunt and Caleb Folan.

One issue many have with Domenech is that the squad has lacked consistency in both lineups and tactics. France have run a 4-4-2, 4-3-3 and the somewhat in-between 4-2-3-1. Ireland on the other hand have kept to the traditional 4-4-2.

I think the lineups will look something like:

Finnan ...O'Shea ...Dunne ...Kilbane

Keogh...Whelan ...Andrews ...McGeady

............Keane... Doyle

Henry .........................Anelka



edit: I've changed the France lineup from my initial idea. For some reason I had Gallas out of the starting lineup despite being the most regular center back in qualifying. I also apparently got the replacement for Toulalan wrong. According to an article in Lequipe, Alou Diarra is the likely starter instead of Sissoko and the center-back pairing will be Abidal and Gallas. At least that's what my quite limited knowledge of French tells me the article says. They also list Gignac up top instead of Benzema. I could see that going either way and I think whoever doesn't start is a likely sub for whomever does.

How they rate.

I applied my new rankings system to all UEFA countries. Fully weighting all matches in qualifying and the finals for Euro 2008 and this qualifying campaign, France came out as the 15th best scoring team and 13th best defensively. I was surprised they were that low to be honest. Overall they were 10th indicating that they weren't far below the teams ahead of them in either attack or defense. Ireland came out 26th best overall. They rate 18th best defensively but only 30th best at scoring.


I'm still unsure which model is best. I'm using the new averages model to come up with the coefficients and then either using Poisson or ordered logit to get the estimates for how likely each possible result is. As I've written before, the logit model is less sensitive to the scoring and defensive factors of the two teams. The results of each model are somewhat close but using Poisson gives a more extreme result.

For the tie as a whole, the logit model predicts that France will advance 62.3% of the time and Ireland 37.7%. The Poisson model gives France a better chance with a 73.1% shot at playing in South Africa, 26.9% for Ireland. For the first leg alone, the averages-logit model has it extremely even with France having a 37.4% chance of winning, Ireland 35.8% with the remaining 26.8% going to a draw. The Poisson model gives France a 40.5% chance of winning outright, Ireland a 30.3% chance of pleasing the home fans and them leaving it all to play for with 29.2% probability. For both models the most likely outcome for the first leg is a 1-1 draw.

Personally, I expect it to be very close. I think Ireland definitely have a chance. Part of that is that I am not a fan of Domenech. My suggested lineup above is what I think they probably run based on what they have done in the past. That lineup should get the job done, but if I were French I'd be worried about both who and what he'll put out there. Ireland should be a tough opponent and I don't see France playing them off the park either leg, especially the first. Despite all that, I think Franced definitely have an edge.

Monday, November 9, 2009

German Bundesliga Rankings

I'll keep this short because I don't know a lot about German teams other than what I've read and what the rankings say:

Bayern Munich have been far from great. Maybe not as bad as their 8th place in the table suggests, but van Gaal has a ways to go to get them back in the title race. Leverkusen look like by far the best team so far.

edit: I just looked at Leverkusen's schedule because I found their huge margin at the top odd. They have played a very tough set of fixtures thus far. Looking at the table, they sit top and have played all teams from 2nd - 7th. Their five matches to round out the first half of the season are against teams that are currently 8th, 10th, 14th, 15th and bottom. While I have yet to see them play, I think it's pretty safe to say that they are likely to extend their 3 point lead if they can just play at something close to the level they have thus far.

French Ligue 1 Rankings and Analysis

I have to admit that I have not seen a single match from the French Ligue this season. Having said that I have seen several matches involving French clubs in European competitions. I watch their top clubs in Europe a lot because I like their style of play and find them entertaining. Other than Spanish, I think French teams are my favorite to watch. I definitely plan on seeing more domestic matches as well as the season moves along.

England and Spain look to have 3 teams with a decent shot at competing for the league. Italy really has one but I'll be generous to Juventus and say it's a two-horse race. In contrast, the Ligue 1 is wide open. 12 matchdays into the season there are 4 clubs within 3 points of the leaders and 4 more within another 3 points. That is amazing and there's nothing close in the other top leagues. Other than lowly Grenoble, the league is pretty competitive top to bottom.

Here are the rankings:

O Rank - rank by goal-scoring ability
D Rank - rank by ability to prevent goals
EGD - expected goal differential if they played a new season and all teams played at level shown by results thus far

Bordeaux third shows why especially at points this early in the season it is handy to have a model around. They sit on top of the table and have the highest goal differential. My rankings are based on goal differential, but, importantly, take schedule into account. Bordeaux have played an incredibly easy schedule. Just looking at the table, in just 12 matches they have played every team in the bottom 7. Adding in their match with Nice, two-thirds of their matches have been against teams in the bottom half of the table. I'm not sure a team has ever played an easier schedule this far into the season when you look at how where their opponents are in the table.

Other than that, a look at the third column confirms my suspicion that there isn't a lot in it at the top. There are 7 other teams within 10 goals in the rating system! Things are wide open. Other than Bordeaux which I discussed above, the other standout in terms of where they are in my rankings compared to the table is Paris Saint-Germain. I think they are 12th in the table for a few reasons. The obvious one is that they still have a match in hand. They also have played a somewhat tougher schedule than average (incredibly tough compared to what Bordeaux have dealt with). In addition to all of that they have had slightly below-expectation luck when it comes to getting points for a team with their goal differential. I expect Les Parisiens to make a surge and at least get back into the fight for the European positions.

Italian Serie A Rankings

This season I've seriously neglected the Serie A. This is somewhat for good reason because the league is shaping up to be a one-horse race. Inter have just looked far and away the best club. Whether you look at points, they have 29 of 36, or goal differential, +19 for an edge of over 1.5 goals per match, they have been impressive while other teams thought to be contenders have been inconsistent. Behind them, Juventus have gotten decent results but seem too shaky, particularly at the back, to put too much pressure on Mourinho's boys. AC Milan have recovered from their early struggles and now sit third, but other than their Champions League match in the Bernabeu they've looked a lot more like old AC Milan than the AC Milan of old. Napoli were considered by many, including me, to potentially fight for the Scudetto but they have struggled and are midtable 11 points off the lead.

It's early still, too early to declare things over but I've seen nothing in watching matches or analyzing results to suggest anything other than Inter running away with it.

Here are the rankings:

O rank - rank by goal-scoring ability
D rank - rank by goal-conceding ability
EGD - expected goal difference if all teams played a full season at the level the results thus far have shown

Other than Inter rating nearly 30 goals better than any other club, the first thing I noticed is that I have AC Milan seventh despite them being third in the table. I believe this to be due to them having had better than average luck when it comes to close-match results. In contrast, Bari are 2 goals better in goal differential but 4 points further back.

Spanish Liga Update and Rankings

Unfortunately I managed to miss last week with the Spanish league update. Last week was more interesting than this week with Barcelona held to a draw against Osasuna. All went to plan this week. I still think the league looks a lot like last season, but it's looking more open than before. Last season at this point Barcelona had 25 points, Real Madrid 23 and Sevilla 20. Currently Barcelona sit top with 26, Real Madrid have 25 and Sevilla 22. So it's a bit closer. Depor and Valencia continue to look very good as well. A team continuing to not look good is Atletico de Madrid. Atleti find themselves in the relegation zone and already eliminated from the Champions League with two matchdays to go. Their losses in the derbi and the week before in Bilbao did little to inspire confidence and bring calm to the chaos.

Here are the rankings, using my new ranking system:

O Rank - rank by goal-scoring strength
D Rank - rank by goal-conceding strength
EGD - expected goal difference if they played a full season at the level the results so far indicate

Barcelona rates about five goals better than Real Madrid when it comes to scoring and about half a goal behind Sevilla defensively. In other words the results so far indicate that they are as dominant as ever. Real Madrid to their credit are looking better than usual defensively. Looking closer to the bottom, Atletico de Madrid are looking awful with an expected goal differential under -24. These numbers indicate that Villarreal and Malaga have been running below expectation in luck since their places in the table are significantly lower than what I have here.

EPL Rankings and Update - 9 November

Chelsea picked up a gritty win at home against Manchester United to go five clear, though Arsenal still have a match in hand. Earlier today Liverpool only managed a draw at Anfield against Birmingham. I am going to claim defeat in my prediction that the big four would stay in the race through the end of the year. I was a full two months off as Liverpool are now 11 points back and out of it already. Man City also got a disappointing result with a 3-3 draw at home against Burnley, the Clarets first points away from home this season. Villa and Spurs both picked up expected wins to continue their fight for a spot in Europe.

Here are the rankings, using my new averages model:

The Poisson rankings have a few differences. In offensive ranking, the Poisson model flips Liverpool and Chelsea. Defensively it flips Arsenal and Man City, leaving Spurs in between. Overall the Poisson model puts Man City one spot below Tottenham instead of above. Nearer the bottom of the list, the Poisson model has Burnley 15th, Blackburn 16th, Portsmouth 17th and Bolton 18th. Otherwise they are the same in terms of ordering.

New Rankings and Predictions System

I've been working on a new model which I think helps with some of the problems the Poisson model has. It is based on work done at Smart Football Rankings, an effort to develop a rankings and prediction system for college (American) football. I'll call this the averages model.

The idea is this: instead of making assumptions about the distribution of goals, let's just look at how well each team does at scoring and conceding goals compared to their opponents. There are two stages. In the first, I calculate a scoring and defensive factor by taking the difference between a team's goals for (also goals against but I'll just talk about the scoring half for this) in each match and how many goals on average were conceded by the opponent in their matches against other teams. To account for home-ground advantage, I adjust the average up or down for the match based on whether the team is at home or away. The adjustment is simply the difference between the league average goals and the average goals scored by home teams only. For example, Manchester United had given up 1 goal per match before facing Chelsea. Home teams on average score roughly 0.29 more goals per match than away teams. Chelsea scored one goal so their goals-for score from their match with Manchester United is 1 - (1 + 0.29) = -0.29. For Manchester United, Chelsea had conceded 8 goals in 11 matches for an average of 0.727 goals per match. Since United were playing away and failed to score, they would get 0 - (0.727 - 0.29) = 0.437. For the first step, this would be done for every team in each match. A team's scoring factor is then the average of these for each match played.

The second step adds a level. The first step compares your team's scoring to that of the average opponent of your opponent. If the teams you play have played an easy schedule themselves then your extra goals look less impressive. A way to control for this is instead of using average goals against use the scores calculated from step one along with the average goals for and against for the league as a whole. Getting back to the Chelsea - Manchester United example, Manchester United's defensive score from the first step is -0.689. In other words, they've given up about 2/3 of a goal per match less on average compared to what their opponents have scored against other opponents. The league average for the league as a whole is 1.52 goals per match, and home teams 1.80. Chelsea scored 1 goal against United so they get 1 - (-0.689 + 1.80) = -0.111. Chelsea's defensive score from step 1 is -0.83 and away teams average 1.23 goals per match so Manchester United got 0 - (-0.83 + 1.23) = -0.4. Each team's scoring and defensive factor is the average of of these for all matches played.

There are two reasons I like this system better than the Poisson. The first is that it has gotten better results in some testing. I've done something very similar to the PLM where I use ordered logit on the expected goals for each team according to the model and its predictions have been better. If there is interest I'll post more detailed work there but I used two different scoring systems. One just looks at squared difference between predicted probabilities and what actually happened, assigning 1 if the outcome (home win, away win, draw) happened. The other was a betting system where I looked at how much would be made by the PLM using odds given by the estimates of the averages model and the other way around. The averages model outperformed the PLM in both of these tests. Beyond just these things, I think the rankings make more sense when I look at where it rates teams.

The second reason I like the averages model better is that it I think using sums is more accurate than using products. In the Poisson model, home-ground advantage is given by multiplying the expected goals for home team by a number between 1.1 and 1.5 for most competitions. Similarly, the expected goals in a match for a team is their scoring factor times the opponent's defensive factor. A result of that is that high-scoring teams are more sensitive to the quality of their opponents and playing at home than low-scoring teams. This doesn't seem to be reflected in reality. I'll write more on this later, but for better teams playing at home tends to be a bit less important. The averages model assumes it equally important for all teams so that's an improvement. Note that other than including home-ground advantage, a major difference between mine and the methodology used by Smart Football Rankings is that they actually use products instead of sums.

For at least the next few weeks I'll include both Poisson/PLM and the averages model when giving rankings and predictions. Because I believe the averages model, and its logit, to be superior I'm using it as my main model until I work out a better one.

Friday, November 6, 2009

Other EPL Predictions

Here are the numbers for the other matches. Keep in mind that the model does not take into account injuries and only uses goals so these numbers are just a rough guide. As I've said several times, I think the model overrates Arsenal particularly as they've been running white hot when it comes to scoring goals.

Aston Villa - Bolton
Villa - 66%
Draw - 22%
Bolton - 12%

Blackburn - Portsmouth
Blackburn - 48%
Draw - 29%
Portsmouth - 23%

Man City - Burnley
Man City - 74%
Draw - 18%
Burnley - 8%

Tottenham Hotspur - Sunderland
Spurs - 63%
Draw - 22%
Sunderland - 15%

Wolverhampton - Arsenal
Wolves - 11%
Draw - 20%
Arsenal - 69%

Hull City - Stoke City
Hull - 34%
Draw - 31%
Stoke - 35%

West Ham - Everton
West Ham - 57%
Draw - 25%
Everton - 18%

Wigan - Fulham
Wigan - 38%
Draw - 30%
Fulham - 32%

Liverpool - Birmingham
Liverpool - 62%
Draw - 24%
Birmingham - 14%

Weekend Preview: Chelsea - Manchester United

The match between the top two teams in the Premiership kicks off at 16:00 local. For the Americans, it can be found at 11 AM Eastern on the Fox Soccer Channel.


These two teams have won the last 5 English Premier League titles. In the last six seasons both have been in the top 3 and three of those times they finished in first and second. While Arsenal and to a lesser extent Liverpool and Man City are thought to be contenders this season, the champion this year will most likely be one of these two clubs.

Looking at their head-to-head results, going further back is even more pointless than usual due to the recent influx of quality at Chelsea when Abramovich took over. I'll give it to you anyway: in league play Chelsea have beaten Manchester United 37 times, Man U have bettered Chelsea 56 times and 41 times they played to a draw. Since Abramovich took over in the summer of 2003, Chelsea have 5 wins, 4 draws and 3 losses against Manchester United in league play. At Stamford Bridge United have not won since the Abramovich takeover; Chelsea have 4 wins and 2 draws in league matches, throw on an extra win and a draw if you want to include cup play. The last time United won at Stamford Bridge was the 2001-2002 season.


Usually when these two clubs play it's a given that they've won 4 or 5 of their last 5 matches but that's not the case this week. Chelsea are 3-0-2 in their last 5 and Manchester United are 3-1-1. Chelsea's losses were 3-1 at Wigan and 2-1 at Aston Villa. You don't expect any team to win them all, but Chelsea fans are surely disappointed with those results. The loss to Villa isn't so bad, they'd probably feel ok with a draw there, but losing by two goals to a team that figures to be midtable at best and probably in the relegation fight is. If you think that Manchester United's loss was 2-0 at Anfield and their draw was 2-2 at home against Sunderland. There they equalized in extra time on an own goal.

An interesting thing is that there is a big divide between home and away form for these teams. Chelsea have won all 5 of their home matches so far this season. After edging out Hull 2-1 in their opener, they have been on fire beating Burnley then Spurs by 3 goals, Liverpool by 2 and then Blackburn 5-0. They have scored 15 goals and only conceded 1 in their 5 home games. Away from home Chelsea have 4 wins and those 2 losses mentioned above. Similarly, Manchester United are 4-1-0 at home with the only blemish that 2-2 draw versus Sunderland. On the road they are a less impressive 3-0-2 with losses to Liverpool and Burnley. We're talking about quite small samples of 5 and 6 matches, but if home and away form mean anything it points to an edge for Chelsea.


Chelsea are relatively free of injury. Mikel and Zhirkov are expected to play but have ankle and knee injuries respectively. Bosingwa is the only player likely to be unavailable. The same cannot be said for Manchester United who have several players that either can't play or won't be fully fit. In the first column, it appears that Rio Ferdinand will not play due to a nagging calf injury. Park Ji-Sung and Hargreaves are also likely to miss out. Someone we know won't play is Gary Neville due to suspension. On the brighter side, Vidic is expected to be able to play. Fletcher still has an ankle injury but says he can play with an injection.

Model Rankings

In my rankings Chelsea are second and Manchester United third. Despite being in positions next to each other, the model actually has them a fair bit apart at nearly 22 goals of goal differential. Most of that difference is at the attacking end. Not to the same extent as Arsenal, but Chelsea seem to have been running hot at scoring so far. They are on pace for 96 goals, the model says they would score about 91 at this pace because they've played a slightly easier than average schedule. Last season they only scored 68. I think that Manchester United have also been running above their scoring expectation as the model says they'll average around 76 goals playing at this level and they also only scored 68 last year. I think both will cool off and it's too early to say for sure, but Chelsea have certainly looked better in attack than Manchester United. This shouldn't be a huge surprise given the sale of their best attacking player over the summer.

On the defensive side of things, the model puts Chelsea on pace to concede 28 and Manchester United 35. Last year they both conceded 24. Manchester United have had some surprising defensive lapses this season. I've said repeatedly that over the last couple years I think they have had the best defense in football but they've not looked like it this year. Something I wonder is how much of it has to do with losing Ronaldo. Ronaldo didn't defend much, but his ability to go after the other team certainly made it more risky to attack the United goal. That doesn't cause things like Rio Ferdinand handing Man City an equalizer in the 90th minute, but I think it does play some role.


Like before, I'm using the PLM to give the result prediction and the Poisson model to say the most likely scoreline. The models give Chelsea a surprisingly big edge, especially when you consider that they don't take injury into account. They say the Blues win just over 55% of the time, United 18% of the time with the remaining 27% being a draw. The most likely scoreline is 1-0 with a 13.5% chance. Next is 2-0 (11.9%), 1-1 (10.8%) and 2-1 (9.6%). I think the injuries become too much to overcome and Chelsea win 2-0. We'll see if I get the exact scoreline for the first time.

Monday, November 2, 2009

More on Corners (Stats Series)

In my previous article I looked at corner kicks. I was surprised to find that there was little to no correlation between the difference between the number of corners for the home and away side in a match and the goal difference in the match. In fact, there was some evidence that there might even be a reverse effect because in matches where the home side won, they got fewer corners on average.

A lesson in variance

Looking into it deeper, there was a problem with some of the results in the previous article: I was not careful enough when looking at variance. I assumed that with those sample sizes the standard deviations would be pretty low so things would look statistically different. That didn't turn out to be the case for one of the results. The reason for this is that the standard deviation for corners is much higher than goals. If you think about it, this makes sense. Some matches your favorite team will get no corners and they might get well over 10 the next time out. Goals are much tighter - 0 to 3 for most matches with the odd 5 or 6-goal performance thrown in there.

As a result of this, the table near the bottom (just above "What is going on here?") is effectively meaningless. There is no statistical difference between the corner differential when the home team wins by 2 as when they win by 1. In other words, while the averages indicate that teams get fewer corners in the more goals they win by, there is a too strong a chance that this just happened in the sample due to randomness so we can't say that the relationship holds.

Having said that, other surprising results are valid. Firstly, the home team on average gets more corners than the away team in any type of match (home win, draw, away win). Furthermore, in matches where the home teams win, the difference in corners is smaller than it is in matches where there is either a draw or the home team loses. So from this we can conclude that when the home side wins, they tend to get fewer corners compared to their opponent. The difference is roughly three quarters of one corner. One claim that can't be verified without getting more data is that the home side gets significantly more corners compared to the away side in matches that end in a draw than those that end in an away win.

In summary, while the other results do not meet the statistical significance test, we can conclude that home teams tend to get more corners than away teams no matter the result of the match and that the difference in home and away corners is smaller in matches where the home team wins than those where the away side gets a result.

Comparing Teams

I decided to look further into it by taking a look at how different teams do when it comes to corners. I would not have found the results surprising before writing the first article. The short of it is that conventional wisdom seems to hold and good teams get more corners over a season than bad teams.

The data is from the last four seasons in the English, Spanish and Italian top flight. For each team I calculated their final tallies in wins, draws, losses, goals for, goals against, goal differential, corners for, corners against and corner differential. I also calculated the average number of corners for and against in matches where the given team won, drew or lost.

I'll start with overall correlations. Looking at goal differential and corner differential, the correlation between the two is 0.58; there is a strong, positive, correlation. In other words, teams that won more corners than their opponents over the course of the season tended to also score more goals than they conceded. The correlations for goals scored and goals conceded and their corresponding corner stats are similar. Both are right around 0.45. So we have what we would expect - teams that get more corners tend to get more goals and those giving away more corners also tend to concede more goals.

Here's a scatter plot with corner differential and goal differential along with the linear regression line.

As you can see, there is definitely a positive relationship. It's not as strong as goal differential and points, but it's certainly there. There are two decent outliers. The one in the upper left is Real Madrid two seasons ago. The Madridistas won the Spanish league and had the league's best goal differential with 48 more goals scored than allowed. Despite that, they won 164 corners and gave up 237. The one on the bottom is Derby County from that same season. The Rams finished with an "impressive" record of 1 win, 8 draws and 29 losses. They scored 20 goals and conceded 89 for a goal differential of -69. Despite that, they "only" allowed 79 more corners than they got. For comparison, Manchester City that season gave up 88 more corners than they got and had a goal differential of -8.

Looking at the graph, it seems to curve up toward the end as far more observations are above the regression line than below. To improve the fit, I ran a regression including a term that is the square of corner differential and got a much better fit as you can see:

Using these results, it depends on where a team is, but an extra corner is worth about an extra quarter of a goal. For better teams it's even more valuable. This is because teams that score more goals get fewer corners per goal. Here's a plot of that:

I find this interesting. It's far from perfect, but corners are a decent representation of attacking chances. Thought of in that way, I would argue that this relationship suggests that teams that score a lot of goals do so not only because they get more attacking opportunities, but that they also convert a higher percentage of those chances. That's not too surprising, strikers certainly get paid to both create and convert on goal-scoring opportunities. I think it's interesting though that the data supports the idea that good attacking teams are more efficient at taking advantage of chances and that it's not simply getting more that leads to more goals.

What about defense? I won't post the scatter plot, but it is essentially the same for goals conceded; teams that concede a lot of goals give up more goals per corner conceded. I would argue that this suggests that teams that are bad defensively not only allow more chances, but they also allow chances that are better on average.

Again, corners aren't a perfect representation of scoring chances. Something like "times with the ball in the attacking third" would be better but it isn't recorded. Sometimes "scoring chances" is given on air during a match, but as far as I know it's never listed as a stat. A problem with scoring chances in general is that it's subjective. The use of the term "half chance" is common and one guy's chance is another guy's half chance and vice-versa. In my view, corner kicks are the best objective method available to measure this.

Viewed thusly, the stats suggest a nice synergy between defense, midfield and attacking players. Team strength is usually pretty similar in all areas. If you'll forgive me for simplifying, midfielders are responsible for both creating attacking opportunities for the team and preventing them for their opponents. Forwards are responsible for converting those chances and defenders for keeping their opponents from doing the same. Good teams tend to have midfielders that create a lot of opportunities for their forwards. These chances will tend to be better than those created by worse teams as well and as a double whammy the forwards on these good teams are better at putting them away. Similarly, good defensive teams have strong midfielders that don't allow a lot of opportunities to score and the defenders take care of business by allowing just a small percentage of these opportunities to be put in.

What about the previous article?

The previous article suggested that there was little to no relationship between the scoreline of a match and the number of corners for each team. This article suggests that good teams get more corners than bad teams. How can that be so? I think the reason gets back to variance. In a single match, anything can happen. That's true for results, I don't need to list big upsets. For corners the variance is even larger. So from one match we can't really conclude much of anything from corner kicks, but over a season there is enough time for things to even out.

As far as home losses leading to more home corners compared to away corners than other results my best guess is that it's selection. Home wins and draws are going to have a lot more cases where an inferior team is ahead or tied and playing 11 men behind the ball against a superior opponent. That situation probably leads to more corners than any other. As long as the better team keeps getting unlucky they're likely to tally a lot of them making the corner difference very small. That is the only explanation I have come up with, I'd love to hear your idea if you have another. Please leave a comment.

Conclusion and Future Work

In my previous work on goal differential, I made the case that conventional wisdom is wrong - there is no evidence that performance in close matches is itself a skill apart from the ability to score and prevent goals and some evidence that it all comes down to luck. In this case though, using full-season data for each team I'm arguing for the common view that good teams are not only better at scoring and defending but more efficient in doing so as they convert a higher percentage of their opportunities and concede on a lower percentage of opportunities they allow their opponents to have.

In the future I may try to use this idea of corners as a proxy for chances to assess goalkeepers and forwards. It's one data point, but I think the Real Madrid outlier above is evidence for the fantastic play of Iker Casillas. I certainly don't think it's the be-all-end-all of stats but corners/goal scored or conceded serve as some measure of how well a team's forwards or defenders and goalkeeper played.

Premier League Rankings - 2 November

I'm going to go quite short this week and mainly just post the rankings. I'm working on a follow-up article on corners and I want to focus on that.

Not a lot of changes this week as the outcomes weren't far off from the expectations of the model. Liverpool dropped a spot while Fulham moved up 2 when Fulham got the 3-1 win at Craven Cottage. Liverpool are certainly looking like the favorite to throw off my prediction that the big four would all be in the race at the new-year break. At the other end of the table, the biggest mover was Portsmouth moving up 5 spots and 15 expected goals in goal difference due to stuffing Wigan 4-0. The (wait, do Portsmouth have a nickname?) are now just 3 points from getting out of the relegation zone and that battle looks like it could be good this year. With Portsmouth's attack moving out of the cellar, Hull have a stranglehold on the bottom position in the rankings as they rate the worst both scoring and defending. It's pretty telling that Burnley moved down in the rankings after beating them 2-0. In fairness, that's due to other results since Burnley's expected goal differential actually went up slightly.

Friday, October 30, 2009

North London Derby Preview

The biggest London derby kicks off at 12:45 local. If you are in the US you can see it on ESPN2 at 8:30 AM Eastern.


Though they played before, this derby really got going in 1913 when Arsenal moved from Plumstead to Highbury. That put the teams four miles from each other. Unlike the East Lancashire Derby, these two teams have met nearly every year. Since 1950 they've been in the same division every season but one. In fact they have met 144 times in league play. In those matches, Arsenal have had the upper hand winning 59. Tottenham have won 45 and 40 were draws.

Recent history is significantly worse for Spurs. They haven't beaten Arsenal in league play in nearly 10 years, last doing so at White Hart Lane on November 7th, 1999. The last time Spurs won in Highbury in league play was in 1993. The last 10 years at home in the derby Arsenal have 8 wins and 2 draws. Those draws came last season and 4 seasons ago so I suppose you could argue that Spurs are doing better in this fixture if you like.


The two clubs are level on points though Arsenal have a match in hand. Despite that and the season still being pretty young, their form is pretty different. Spurs started out on fire with four wins. They have since gone 2-1-3. To be fair, two of their three losses were at Chelsea and against Manchester United so schedule plays a role. They can't feel good about their loss last week though as it was at home against Stoke City, the first win for the Potters away from home this season. Since losing 4-2 at Manchester City, the end of two straight matchdays of losing in Manchester, Arsenal have won 4 and last week got a draw at West Ham. Expanding beyond league play, both teams won at home in the midweek against opponents from Liverpool. Spurs beat Everton 2-0 while Arsenal took care of Liverpool 2-1. The previous week Arsenal also had a disappointing 1-1 draw at AZ Alkmaar, giving up the equalizer in stoppage time in the second half.

Injuries and Suspensions

Arsenal have several injuries. Rosicky is questionable with a knee injury and Denison, Djourou, Wilshere and Walcott will all be out. Goalkeeper Lukasz Fabianski picked up a thigh injury in the league cup so he should be out as well. It's unlucky for Fabianski who was playing in his first match of the year due to a knee injury.

For Spurs, Jermain Defoe is still out suspended and Modric out with a broken leg. Other concerns are Aaron Lennon and Giovani Dos Santos. Both have ankle injuries and may or may not be ready. Jonathan Woodgate and Ledley King will probably be available but have had injury issues as well.

Scoring and Conceding

Arsenal have been the best scoring team this season. They have scored 5 more goals than any other club despite playing one fewer match than most. For the Gunners, the problems look to be at the defensive end. They have conceded 13 goals, 5 more than Chelsea, 2 more than Manchester United and the same as Liverpool. Again though, that's playing one fewer match than those title contenders. It is inevitable that their goal-scoring rate will slow down as they are on an unsustainable pace. They rate as the best team at scoring but only 10th best at defending in my EPL ranking.

Looking at goal differential, Tottenham have been running a bit above expectation. The teams around them have significantly higher goal differentials. My model rates them as the 4th best scoring team, 9th best defensively and 6th best overall.


I'm a bit in between models right now. I'm using the PLM to predict the result and the Poisson model to give me the most likely scorelines. Both models have Arsenal as huge favorites. The PLM gives them a 67% chance, Spurs just a 14% shot with the other 19% the likelihood of a draw. It gives 3-1, 3-2, 4-1, and 4-2 all about the same chance as the most likely scorelines, though each only have around a 5% chance. This is one of those matches where I don't fully trust the model. The problem is that Arsenal have been running at a ridiculous pace, almost certainly scoring more goals than expected given their skill level. They are on pace for 122 goals for the season while last year they scored only 68. I think Arsenal are still favorites but not to that extent. Something more like 50% Arsenal win and 25% for Spurs and the draw seems about right to me.

I'm making my prediction 3-1 Arsenal.