Tuesday, December 29, 2009

On Holiday

I am currently on a trip with my family and will probably not be able to post anything until early next week. I may have access in the next few days, in which case I'll probably post on the Spanish league since they come back this weekend and the other leagues have more time off. When I get back home, I'll post a half-time report on each of the top leagues. Next week I hope to get back on track with more general articles about the game. I plan to continue the stats series and am working on analyzing derbies to determine if they are any different than regular matches in terms of predictability and in factors such as home-ground advantage.

Thursday, December 24, 2009

EPL Boxing Day to 2010 Preview

As usual, the last week of the year is filled with English football. This season the fixtures list is as well organized as Germany sitting on a 1-0 lead. Each club plays one match at home and one away. Most clubs play on Boxing Day and then again two days later. There are, however, two matches on the 27th, 29th and 30th so after Christmas you are covered until New Year's Eve.

I'm going to make this a long preview, going through each club starting at the top of the table.

at Birmingham the 26th
vs Fulham the 28th

Chelsea are relatively injury free, they should only be missing Essien and Bosingwa. Essien, who was hurt two weeks ago in a meaningless Champions League match, is a huge loss but their injury situation is not nearly as bad as their two main rivals for the title. Chelsea have two fairly tough matches. While they are favored in each match, they are underdogs to win both of them.

The model says:
Expected points: 4.44
6 points: 45.3% chance
4 points: 27%
3 points: 17.5%
2 points: 3.8%
1 point: 4.9%
0 points: 1.5%

Manchester United
at Hull the 27th
vs Wigan the 30th

It seems like the Red Devils are missing just about everybody. Certainly out for both matches are van der Sar, O'Shea, Evans, Ferdinand and Hargreaves. Nani should be out for both as well. Probably out for the first, and maybe out for the second as well are Vidic, Giggs, Neville and Brown. The good news if you're a United fan is that both matches are against weak opponents. In fact, Hull and Wigan are the worst in the league in goal differential.

The model says:
Expected points: 4.96
6 points: 60.3% chance
4 points: 22.5%
3 points: 12.8%
2 points: 1.8%
1 point: 2%
0 points: 0.6%

vs Aston Villa the 27th
at Portsmouth the 30th

Arsenal are also in bad shape as far as injuries go. van Persie, Rosicky, Gibbs, Clichy and Bendtner will be out for both matches. Traore and most importantly Fabregas are questionable for the first match against Villa. That match should be a great one that is big for both sides. I think the Portsmouth match will be tougher than it appears as well. Arsenal should be ok and playing later than most teams should help them slightly since it give Cesc an extra day to heal. Plus they get two days off in between instead of the more common one.

The model says:
4.14 expected points
6 points: 37.5% chance
4 points: 28.0%
3 points: 20.3%
2 points: 4.9%
1 point: 6.9%
0 points: 2.4%

Aston Villa
at Arsenal the 27th
vs Liverpool the 29th

If you are a neutral and have to watch just one club this week it should probably be Villa as they play what look like the two best fixtures. Heskey went down with a groin strain last weekend and is questionable for both matches, otherwise they are healthy. That's good news as they'll need to put full effort into both matches against potential rivals for European spots. Their second match may be extra tough since they only have one day off in between whilst Liverpool will have two.

Expected points: 1.84
6 points: 4.0%
4 points: 9.9%
3 points: 27.1%
2 points: 5.7%
1 point: 27.3%
0 points: 26%

Tottenham Hotspur
at Fulham the 26th
vs West Ham the 28th

Two fairly tough matches await Spurs who are hoping to take advantage of recent form. Nothing new on the injury front. They will still be without Woodgate as well as the out-of-favo(u)r Bentley. Luka Modric, returning from breaking his leg last August, came on as a sub on the 12th against Wolves but did not play against Man City. He may not start both matches but it seems like he should play in one or both matches. Given Aston Villa's two matches, this would be a great opportunity for Spurs to slide into the fourth Champions League spot, at least temporarily.

Expected points: 3.72
6 points: 25.3%
4 points: 27.3%
3 points: 31.5%
2 points: 4.4%
1 point: 8.3%
0 points: 3.2%

Man City
vs Stoke City on the 26th
at Wolverhampton Wanderers the 28th
Expected Points: 3.98

vs Chelsea on the 26th
at Stoke City on the 28th
Expected Points: 1.91

vs Wolves the 26th
at Aston Villa the 29th
Expected Points: 3.88

vs Tottenham the 26th
at Chelsea the 28th
Expected Points: 1.9

vs Everton the 26th
at Blackburn the 28th
Expected Points: 2.65

vs Bolton the 26th
at Everton the 28th
Expected Points: 2.6

at Sunderland the 26th
vs Burnley the 28th
Expected Points: 3.09

vs Blackburn the 26th
at Manchester United the 30th
Expected Points: 1.94

vs Manchester United on the 27th
at Bolton the 29th
Expected Points: 1.28

at Burnley the 26th
vs Hull City the 29th
Expected Points: 2.91

West Ham
vs Portsmouth the 26th
at Tottenham the 28th
Expected Points: 1.98

at West Ham the 26th
vs Arsenal the 30th
Expected Points: 1.99

Wednesday, December 23, 2009

EPL Pre Boxing Day Rankings Update

In a week that featured a midweek matchday, there were a lot of interesting results. Most surprising was probably Fulham getting a 3-0 home win against Manchester United. Fulham getting a win over Man U at Craven Cottage isn't too surprising by itself but the scoreline is. Granted, United have had a lot of injury issues lately, and the score was unfair but the result is still eye-opening. The other big upset I suppose was Liverpool losing 2-0 to Portsmouth. It's similar to Fulham - Man United in that it's not the most amazing result ever but still unexpected. Liverpool have been hemorrhaging points lately, and in my view Portsmouth aren't as bad as their points or status at the bottom of the table indicate.

The big winners of the week were probably Spurs and Aston Villa. While all the other clubs near the top faltered, both got the job done. Spurs ran through Manchester City in the midweek with an impressive 3-0 win. Having watched this one I think the scoreline was fair; Tottenham dominated for most of the match. They wrapped things up with a 0-2 win at Blackburn. Villa's two wins weren't particular impressive, 0-2 at Sunderland and 1-0 against Stoke City but they count all the same and the other teams around them, Spurs excepted, all slipped up.

Here are the updated rankings. I'm including more info on the first chart. These are the rankings based only on the number of goals in each match:

EGF - expected goals for
EGA - expected goals against
EGD - expected goal differential
Change - change in expected goal differential from last rankings (14 Dec)
expected goals for/against are the average number of goals scored/conceded by the club if they all played a new season at the level shown so far by all results.

Here is the rankings based on stats as well as results:

EP is the expected number of league points if they played a new season at the level shown so far. Change is the change in expected points since I posted the last rankings on 14 December.

One of the nice features of adding stats is that because there is more information they are less prone to move based on one or two good or bad matches. You can see that above, though keep in mind that a difference of 1 goal in goal differential is a change of about 2/3 of a point. Even taking that into account the average difference is about 3 times as large in the goals-only model compared to the one that also uses match stats.

Tuesday, December 22, 2009

Effect of Boxing Day Glut of Matches

The English Premier League is unique among top flights in that they play a lot of matches between Christmas and the first few days of the year while most leagues are completely off. Most seasons a team will play on Boxing Day and then again just two days later. Some seasons there will be another match just 2 or 3 days after that so they are playing 3 league matches in 6 or 7 days. This year there are matches every day from the 26th to the 30th of December. Most teams play on the 26th and then on the 28th.

Does this sequence of fixtures help any type of team? It seems like it could go either way. The good teams tend to be deeper in talent so the short amount of rest could help them as they can start fresh players that are still at a high level. On the other hand, fatigue could add some randomness to the short-rest matches. That is probably good for the bad teams since it makes upsets more likely. Perhaps the most sensible guess is that it doesn't really matter; both teams face the same strain.

To test this, I came up with a simple model with just one input for all matches and another for the matches played on short rest during the last week of the year or the first few days of the new year depending on the schedule. The data is all matches since the 1995-1996 season. The input is the difference in average goal differential for the home and away teams in all matches other than the one in question. So instead of the predictive model, I'm actually using data from after a given match took place. For example, if the match is Arsenal at home against Everton two seasons ago I'll take Arsenal's goal differential for the season, subtract off their goal differential in that match, and divide by 37. I then do the same for Everton and the input value is the difference. I then include another variable which takes on 0 for all matches other than those with short rest at the end or beginning of the calendar year. For the matches we are interested in, the value is the same as the other - the difference in average goal differential between the home and away team in all other matches. I then ran those through an ordered logit model to see if there was a difference between the short-rest matches just after boxing day and the regular ones.

In case that didn't make sense, the only thing you need to know is that if the coefficient on the short-rest variable is significant and positive, then that means that the shortened rest favors good teams. If it is significantly negative then that means that the schedule favors bad teams making them more likely to get a result against better sides. If it is very close to zero either way then that indicates that the difference in schedule from the rest of the season doesn't matter either way.

As it turns out, there is no evidence of an effect either way. The coefficient was 0.12 with a standard error of 0.167. The standard error being larger than the value of the coefficient means it is very likely that the difference is just due to randomness. The p-value is 0.472, meaning that there is about a 47% chance of values this extreme if the actual value of the coefficient is 0. That's quite high. There is no evidence of any difference between the post-Christmas group of matches and the rest of the season.

There are reasons to dislike the scheduling, I'm sure players don't like playing so frequently in such a short period of time, but it appears that fairness isn't a factor as it doesn't overly benefit or punish good or bad teams.

Friday, December 18, 2009

EPL Predictions

To be honest there aren't any big matches this weekend so I'll just give you the predictions of the stat-based model.

Portsmouth - Liverpool
Portsmouth - 21%
Liverpool - 53%
Draw - 26%

Aston Villa - Stoke City
Aston Villa - 63%
Stoke - 15%
Draw - 22%

Blackburn - Tottenham
Blackburn - 25%
Tottenham - 47%
Draw - 28%

Fulham - Manchester United
Fulham - 22%
Man United - 51%
Draw - 27%

Manchester City - Sunderland
Man City - 67%
Sunderland - 13%
Draw - 20%

Arsenal - Hull
Arsenal - 90%
Hull - 3%
Draw - 7%

Wolverhampton - Burnley
Wolves - 42%
Burnley - 29%
Draw - 29%

Everton - Birmingham
Everton - 53%
Birmingham - 21%
Draw - 26%

West Ham - Chelsea
West Ham - 9%
Chelsea - 75%
Draw - 16%

Monday, December 14, 2009

Bundesliga Update

It's been a while since I wrote about the Bundesliga. In my last post on the subject, I noted that Leverkusen were at the top of the table and had played a very easy schedule. Based on that, I predicted that they would extend their lead which was then at 3 points. That hasn't happened. After drawing in Munich and crushing Stuttgart at home they have two disappointing draws - at Hannover and then last Friday against Hertha Berlin. I still believe that they are the best team, but the gap is small and I could certainly see them falling.

On the other side, Bayern Munich seem to have found a rhythm. They won their last three matches, albeit against bottom half of the table opponents, by a combined 8 goals. They also destroyed Juventus in Torino in a match that was crucial for both sides. With the draws by Leverkusen, Bayern find themselves just two points out. That's a far cry from where they were last I wrote. I'm not ready to make them favorites but I definitely think they have a better shot now than I did before.

Here are the results-based rankings:

GFR - Ranking by goals for, adjusted for schedule
GAR - Ranking by goals against, adjusted for schedule
EGD - expected goal difference if they played a new season starting today at the level shown by results.

Here are the rankings using the model that incorporates stats:

Epts is the expected points if they played a new season starting today at the level shown by results and stats from all matches thus far.

Both models have Leverkusen top, as does the league table. Near the top, Werder Bremen are two spots higher when stats are taken into account. Bayern and Hamburg go the other way. The biggest movers overall were Mainz, who drop 5 places, and Stuttgart, who moved up 4 spots. While the stats model is still pretty new and I'm working on it, as a general rule the teams that are higher in that model than the goals-only one have usually gotten unlucky and run below expectation in goals given how their matches have gone. On the flip side, teams better in the goals-only ranking have been lucky and scored more goals than expected. I believe that the stats-based model is better when it comes to being predictive.

La Liga Rankings

Real Madrid picked up a big win at the Mestalla, otherwise it was as expected near the top. The biggest story is probably Pepe being out for the rest of the year for Real Madrid.

Here are the rankings using the system that only takes goals into account:

GFR - Goals for rank
GAR - Goals against rank
EGD - expected goals if they played a new season at the level shown by the scorelines thus far.

Here are the rankings going by the system that uses stats:

EP is the number of expected points if they played a new season at the level shown so far by the scores and stats from all the matches thus far. The big positive changes from the two rankings systems are Getafe and Zaragoza. Getafe are 4 spots higher in the system that uses stats and Zaragoza 6. Going the other way, when I included stats that dropped Sporting de Gijon 5 spots and Osasuna, Racing de Santander and Almeria all 4 spots. I'm still looking into the new stats model, but that's likely an indicator that Getafe and Zaragoza have been unlucky and run below expectation while Sporting, Osasuna, Racing and Almeria have run above expectation.

EPL Midweek Predictions

Here are the predictions for the midweek. There is a not unusual mix of some matches that should be close and others that should be less than competitive. Last weekend was full of surprises, we'll see what this week brings:

Sunderland - Aston Villa
Sunderland: 36%
Aston Villa: 34%
Draw: 29%

Birmingham - Blackburn
Birmingham: 43%
Blackburn: 29%
Draw: 29%

Bolton - West Ham
Bolton: 38%
West Ham: 33%
Draw: 29%

Manchester United - Wolverhampton
Man United: 83%
Wolves: 6%
Draw: 11%

Burnley - Arsenal
Burnley: 12%
Arsenal: 69%
Draw: 19%

Chelsea - Portsmouth
Chelsea: 83%
Portsmouth: 6%
Draw: 11%

Liverpool - Wigan
Liverpool: 81%
Wigan: 7%
Draw: 13%

Tottenham - Manchester City
Spurs: 48%
Wigan: 24%
Draw: 24%

Updated EPL Rankings

What a crazy week in the EPL. Every top team other than Arsenal disappointed. Arsenal are now back in the race as they will find themselves down just 3 if they can win their match in hand. Manchester United fans can look at it either way as they lost at home to a pretty good side in Aston Villa but only lost a point as Everton managed to get a draw at Stamford Bridge. Just when it was looking like a runaway at the top, it seems now like it could be pretty open.

Here is the ranking using the system that only takes goals into account:

GD is the expected goal differential if they played a new season at the level shown by the number of goals scored so far. GF and GA are the ranking of the team in that category.

Here is the ranking according to the new system that takes both goals and several stats into account:

Epts is the expected number of league points they would get if they played a new season at the level shown so far by the number of goals for and against as well as the stats from the matches. The last column is the change from last week.

I remain baffled by Stoke City. To be honest, I am hoping that they don't slide and stay up because they are an interesting case. I am going to write an article just about them because they did the same last season. This week was no exception with them putting both of their shots on target into the goal.

Stay tuned for a preview of matches taking place tomorrow and Wednesday.

Friday, December 11, 2009

EPL Weekend Preview

This weekend has some pretty interesting matches. The highlight is Sunday's matchup between Liverpool and Arsenal at Anfield. Those teams both had title hopes at the beginning of the season. Liverpool would need an incredibly run and for several teams to fade to claw their way back into it and Arsenal in not quite the same situation but well out of it at the moment, especially with such a brutal injury situation.

Here are the model predictions for each match:

Stoke - Wigan
Stoke: 40%
Wigan: 31%
Draw: 29%

Birmingham - West Ham
Birmingham: 43%
West Ham: 28%

Bolton - Man City
Bolton: 19%
Man City: 56%
Draw: 25%

Burnley - Fulham
Burnley: 38%
Fulham: 33%
Draw: 29%

Chelsea - Everton
Chelsea: 85%
Everton: 5%
Draw: 10%

Hull - Blackburn
Hull: 34%
Blackburn: 37%
Draw: 29%

Sunderland - Portsmouth
Sunderland: 46%
Portsmouth: 26%
Draw: 28%

Tottenham - Wolverhampton
Tottenham: 79%
Wolverhampton: 7%
Draw: 14%

Manchester United - Aston Villa
United: 76%
Villa: 9%
Draw: 15%

Liverpool - Arsenal
Liverpool: 43%
Arsenal: 29%
Draw: 28%

Monday, December 7, 2009

EPL Rankings Update - With New Stat-Based Ranking

The league got a bit more interesting with Man City getting a big win over Chelsea. With their big win over West Ham Manchester United, with a completely decimated backline, are back to just 2 points off. Arsenal are still vaguely in the race, now 8 back with that match in hand. Chelsea are clearly still the favorites to win the league, but at least now we have something to talk about when it comes to the league race.

I've been working on a new rankings and prediction system and I'm ready to unveil it now. Unlike the previous work, this new system uses stats from the match. More specifically, shots, shots on target, fouls, corners and bookings. I'll add time of possession and offside calls as well at some point but I have to do some work recording those. Using those stats and a logit model, I estimate the probability of each possible outcome if the teams played a new full season at the level shown by all the stats. With those probabilities I get the expected number of league points. So the right column is the average number of points the team would get if they played a new season at the level shown by the results and stats shown before.

Rank Club Expected Points
1 Chelsea 87.5
2 Man United 84.3
3 Arsenal 80.3
4 Liverpool 75.9
5 Tottenham 68.2
6 Man City 65.0
7 Aston Villa 53.0
8 Fulham 51.2
9 Everton 47.6
10 Portsmouth 45.4
11 Sunderland 45.2
12 Blackburn 44.4
13 West Ham 43.8
14 Birmingham 42.6
15 Burnley 41.1
16 Wigan 40.6
17 Wolves 37.6
18 Stoke 37.4
19 Bolton 35.4
20 Hull 31.3

Here are the rankings using the previous system that just uses goals for each side.

The biggest difference is Stoke. I was quite surprised to see the Potters so low in my new rankings. Looking at the stats, I can now understand why. They are 18th in the league in shot differential (shots taken minus shots allowed) and last in the league in shots-on-target differential, fouls differential (times fouled minus fouls) and corner differential. I'll get there with the stat series eventually, but for shots there is between a fair and strong correlation between all those stats and goal differential. It looks like Stoke have gotten lucky in terms of how many goals they've scored and allowed when compared to the stats from their matches.

Another club that has a big difference is Portsmouth. I was surprised to discover that they are 5th in shot differential and 6th in shots-on-target differential! They are 15th in fouls, 12th best in corners and 14th in goal differential, which the model also uses. So they appear to be on the other end - they've gotten unlucky when it comes to scoring and also the timing of those goals.

Friday, December 4, 2009

World Cup Draw

I will be writing a lot more on the World Cup over time, but here are my initial thoughts on the draw.


United States - It was close to a dream draw for the yanks. They missed out on South Africa, but England is one of the easier seeded teams to draw. The other two teams are the worst teams in their pots. There weren't many possible draws that have the US as second-best in the group, but they got one. Qualifying still won't be easy, but it's a lot more likely now than it was yesterday.

England - Similar story to the above. Depending on your point of view, the USA was the best or second best behind Mexico in pot 2 so that was unlucky, but again the other two teams were the best draws. England should go through top.

France - Whether it was due to the handball incident or something else, France weren't seeded because FIFA changed the way they did the seeding. With the draw they effectively became the seeded team in the group. Drawing South Africa not only meant the weakest seeded team, but it also meant not having to play one of the big African teams. To make things even better, should France advance they will play one of Argentina, Nigeria, South Korea and Greece. Those teams aren't bad, but as I'll point out later, it could have been a lot worse. The only thing not perfect is that they drew one of the two big countries from pot 2 with Mexico.

Italy - The defending champs had by far the best draw. With New Zealand and Slovakia they got countries that were certainly in the bottom two in their pot. Paraguay are tougher to rate, but I think most everyone would put them in the bottom half of their pot and some second worst. Their draw for the round of 16 is favorable as well as they are very likely to get the best of Denmark, Japan and Cameroon.


Spain - Others are likely to say that Spain had a good to great draw. I think they were one of most unlucky countries. The reason for that is that from most any group they would be significant favorites to advance, even to advance first. The group draw isn't very relevant for the kings of Europe. The knockout draw will lead to what should be the best round of 16 match ever for the neutrals but it is absolutely brutal. Spain will play one of Brazil, Côte d'Ivoire and Portugal in the round of 16. That is astounding. The two runaway favorites in the tournament, at least before the draw, were Brazil and Spain. Portugal and especially the Côte d'Ivoire are both considered to be in the second tier of teams that definitely have a shot. There is more than a 99% chance that Spain will play one of those teams. Looking forward to the round of 8 is tougher, but if they can get out of that their reward is likely to be Italy, who matchup well with them. En route to winning Euro 2008, Spain had to best Italy on penalties after a neither team scored a goal in 120 minutes.

Brazil, Côte d'Ivoire, Portugal - Not a lot to say here. Absolutely brutal for these countries. Not only is one of the best teams in the competition going to be eliminated in the group stage, but one of the advancing teams, probably the one finishing second, has to face Spain in the first knockout round.

Mixed Bag:

The Netherlands - It's tough to argue that the Dutch group draw was great. Denmark and Japan are both pretty good teams that can give teams problems. Cameroon are a very talented side with the much discussed advantage of playing on their home continent. That's a tougher than average group. Having said that, the Dutch should finish top of the group as they are pretty easily the best side. Their reward in the round of 16 if they do that is most likely Paraguay or Slovakia, so that part is great. On the other hand, their round of 8 match is most likely whoever wins the group with Brazil, Côte d'Ivoire and Portugal, with Brazil most likely. There aren't usually a lot of easy wins in the last 8 but it could obviously be a lot better than that. So some good and bad news.

Germany - I thought of putting Germany in the winners category, but decided to put them here instead. Australia, Serbia and Ghana is a tough draw as all three were among the best teams in their pot. Germany still should get out of that in first. If they do, they will likely play the United States, Slovenia or Algeria. Any of those would be below average for the round of 16. Looking further ahead, their round of 8 match is against the best of Argentina, Nigeria, South Korea and Greece or the second best of France, Mexico, Uruguay and South Africa. So barring a big upset, if they can advance first from the group stage Germany will have one of the easier round of 16 opponents and the easiest in the quarterfinal. Even if you take into account their pretty tough group, for me the group of death because all four teams are good, no other team has as clear a path to the semifinal.

Thursday, December 3, 2009

Stat Series: Fouls

I'll now continue the stat series by looking at fouls. Like corners and time of possession, about which I'll write in the future, fouls are often used as an indicator that one team was a lot better than the other in a match. A defender usually fouls a player with the ball because he cannot otherwise make a tackle. It is often because the attacker is making a run past the defender or receiving the ball in a dangerous area. Other than a few exceptions like fouling the keeper on a corner, fouls tend to be called on you when your team is getting outplayed, at least for the couple seconds before they occur. But are fouls really a strong indicator of performance in a match?

If that is the case then we should see a link between the number of fouls committed by each team and the match result. If getting fouled a lot more often than fouling your opponents indicates that you are playing better, and it's a given that playing better leads to winning, then getting fouled more often should be well correlated with getting results.

Looking at the data, I was quite surprised to find that there is practically no connection between the number of fouls called for each team and the results of matches. In other words, foul statistics seem to not indicate anything at all in terms of which team was better on a given day. My data set is all matches from the previous two seasons in the English, Spanish, Italian, German and French top flight. I may expand the dataset for this study but frankly I didn't see the point after seeing what the results were for the last two years.

I looked at it a couple different ways. The first was just to look at the correlation between foul differential (home fouls minus away fouls) and goal differential (home goals minus away goals). A positive correlation would indicate that fouling more often than your opponent tends to lead to outscoring them. A negative correlation would be more in line with what you would expect; if you get fouled more often than your opponent you will tend to outscore them. If the correlation coefficient, which varies between -1 and 1, is very close to zero then that indicates that there is little relationship between the two. As it turned out that was the case. The correlation coefficient was -0.01278. In other words, the data indicates that there is essentially no link between number of fouls committed by each team and the goal difference in a match.

To look into this further, I broke the matches up into three groups: those where the home team committed more fouls than the away team, those where the away team fouled more and those where the two teams committed the same number of fouls. In the sample there were 1,582 matches where the home side committed more fouls. In those, the home team won 729 times, got a draw in 415 and lost 438. When the home team committed more fouls they averaged 1.64 points per match. The away team fouled more in 1824 matches. In those, the home team went 853-484-487 for an average of 1.66 points per match. There were 246 matches where the two teams were called for the same number of fouls. In those the home side went 122-55-69 for 1.71 points per match.

For the fellow nerds out there, there is no statistical significance when comparing 1.66 points per match and 1.64. More importantly though, there is no actual significance! "Winning the fouls battle" was only worth 2 hundredths of a league point. That is absolutely nothing. If you gave that up over an entire season, it would still cost you less than one point on average: 0.76 for a 20-team league. In a previous article on goal differential and points, I showed that there is a very strong link between goal differential over a season and points. Using the formula from that article, giving up .02 points per match every match for the entire season is about the equivalent of conceding one more goal overall. The stats indicate that the difference in points from getting fouled more or fewer times in a match almost certainly is due to random noise and not a real effect. My point is that even if it were due to something real, the difference is so small that it isn't relevant.

To put the final nail in the coffin, I looked only at matches where one team or the other committed a lot more fouls. In those where the away team committed 5 or more additional fouls than the home team, the home side went 442-273-260 for an average of 1.64 points. When the home side committed 5+ fouls more than the away team they went 330-196-208 for an average of 1.62 points per match. So the difference is the same as in the other case. I find it curious, though it doesn't matter at all, that the home teams did better in matches where the number of fouls committed by the two teams were close than either extreme. So even in the extreme case, it's extremely likely that the difference is just due to noise and even if it isn't the difference is so small that it doesn't matter.

There is no evidence whatsoever of a connection between the number of fouls committed by each team in a match and the result. If in the future you see me using the difference in fouls to make the case that one team was better than the other despite the result, please yell at me in the comments.

In the future I will look at full-season information for different teams as well as different leagues. It is possible that there is a link between fouls and results in individual matches, but it is restricted to certain leagues due to style of play or officiating.

Wednesday, December 2, 2009

FIFA Announces Pots for Draw

FIFA has announced the pots for the World Cup draw, with one major surprise. Instead of using the same type of formula they used last time, which I used in a past article to calculate the 7 seeded teams along with South Africa, they just used the FIFA rankings from last month. This meant that France dropped out and the Netherlands became seeded. That's a pretty big break from tradition as they've used roughly the same seeding formula for the last handful of World Cups.

Here are the pots:

South Africa

Pot 2:
North Korea
South Korea
New Zealand
United States

Pot 3:
Côte d’Ivoir

Pot 4:

Here are a few thoughts:
- The second pot contains no teams from the same confederation as the seeded pot. This means that for those teams, each of the seeded teams are equally likely.

- If you are like most people and think that South Africa is by far the easiest draw from the seeded pot then the unseeded South American sides have an edge because each has a 1 in 3 chance of playing in South Africa's group. This point and the last one are because no two teams from the same confederation other than UEFA can be put into the same group.

- If you're French or a fan you obviously aren't happy. If you are Dutch then I'm sure you are. For the rest of the world, the only change as far as the seeding goes is that France replaced the Netherlands as the big team you'd like to avoid. In my view, and probably that of most others, France, Portugal and the Ivory Coast are clearly the best three teams that aren't seeded. I think it's a lock that one of those three will be in what is considered the group of death.

- The United States and Mexico are also very likely to be in the group of death. Keep in mind that this doesn't mean the draw will be tougher. The US and Honduras have the exact same probability of drawing any given set of opponents, but because the US and Mexico are the best two teams in their pot they are likely to be in the group of death.

- Perhaps because of South Africa or maybe just some kind of recency bias, it seems like there is more on the line in terms of how tough the groups could be. A possible group is Brazil, the United States, Ivory Coast and France. Another is South Africa, North Korea, Uruguay and Slovenia. For countries like the US, Denmark, Chile and Ghana there are possible groups where it would be very unlikely to get out, maybe a 10% chance, and others where it would be a near certainty.

Tuesday, December 1, 2009

Spanish League Post-Clasico Update

Barcelona went back on top by virtue of their 1-0 win over Real Madrid. To be honest, Madrid far exceeded my expectations in the match. They actually outplayed Barcelona for long stretches, especially in the first half and Barça were fortunate in that Zlatan put away their only clear chance of the match. I expected it to be like last year when Barça played Real Madrid off the park both times. Elsewhere, Sevilla got a disappointing result with a draw versus their Andalusia rival Malaga. The result didn't feel particularly fair as Sevilla dominated, including the first half even though the score didn't indicate that; they went into the break down 0-2. Valencia also failed to win at home, picking up just a draw against fellow Catalan speakers Mallorca. Atletico de Madrid perhaps are getting it back as they crushed Espanyol 4-0.

Here are the rankings:

The last column is the expected goal differential if the teams played a new season at the level the results so far indicate. O and D are offensive and defensive ranks which are based on how many goals scored and conceded as well as the opponent strength.

Barcelona remain best at both ends of the pitch. Sevilla slipped a bit due to giving up two goals at home to Malaga, who are pretty weak when it comes to scoring.