Monday, November 30, 2009

EPL Update - Chelsea Clear Top

With the big win over Arsenal at Emirates, Chelsea cemented their place at the top. The blues remain 5 points clear of Manchester United and are now 11 points ahead of the Gunners, with Arsenal having a match in hand. Looking ahead to December, Chelsea have Man City next and then a pretty comfortable list of fixtures: Everton, Portsmouth, West Ham, Birmingham and Fulham. They have a great shot at being in a dominant position come the break. They will have to be good though since Manchester United has a pretty similarly easy schedule for that stretch. I thought it would be different but at this point it's tough to see anybody but those two winning it, with Chelsea the favorite at the moment.

Here's the updated rankings:

Again, the last column is the expected goal differential if these teams played out a full season at the level shown so far.

Someone asked me when I expect these numbers to start to converge. I don't have a good answer to that and it's something I'll be looking into. They will certainly converge at the end of the season which is a nice feature of the model. Most teams have played 14 matches, which by the way is right around the number of matches for a World Cup qualifying campaign. I think things should be pretty good, but because right now the model only uses goals and no other statistics, it is still subject to randomness.

When I was checking on the numbers, I discovered something interesting. I've said before that teams like Arsenal and even Chelsea are overrated when it comes to scoring as they've likely been running over expectation. The reason for that is that the model predicts an expected goals of 108 for Arsenal and 97 for Chelsea when both clubs scored 68 goals last season with similar rosters. The model accounts for schedule so it's a bit different, but both of those clubs are on pace for about as many goals as the model says. When I compared other clubs' expected goals with last season, I discovered that 14 of the 17 teams that played last season in the EPL have a higher expected goals scored.

I thought this might be a problem with the model, but for all matches thus far the average number of combined goals is 3.04. Last season it was 2.21. If scoring stays at this pace, there will be about 318 more goals scored this year than last year in the entire league. I have no explanation for why this would be the case, but scoring seems to be up in a big way. I'm going to write another article or two for the stats series but this raised another interesting question which I'll write about. I wonder if scoring is often higher in matches in the early part of the season compared to later.

Saturday, November 28, 2009

EPL Mini Preview

Sorry for the lack of posts this week. I've been visiting family for Thanksgiving, a major holiday in the US. So as not to leave you completely hanging, here are the model predictions for the EPL this week. I should have a more-detailed preview for Arsenal - Chelsea and the Clasico tomorrow. I know it's late but perhaps you can read it the morning of the matches to get in the mood.

Saturday:

Blackburn - Stoke
Blackburn: 43.5%
Stoke: 27.4%
Draw: 29.1%

Fulham - Bolton
Fulham: 65.9%
Bolton: 11.7%
Draw: 22.4%

Man City - Hull City
Manchester: 75.6%
Hull: 7.1%
Draw: 17.3%

Portsmouth - Manchester United
Portsmouth: 22.5%
Man United: 47.9%

West Ham - Burnley
West Ham: 56.5%
Burnley: 18.2%
Draw: 25.2%

Wigan - Sunderland
Wigan: 29.8%
Sunderland: 42.1%
Draw: 28.1%

Aston Villa - Tottenham
Aston Villa: 39.8%
Tottenham: 32.3%
Draw: 27.9%


Sunday:

Wolves - Birmingham
Wolves: 36.9%
Birmingham: 31.9%
Draw: 31.1%

Everton - Liverpool
Everton: 26.1%
Liverpool: 47.5%
Draw: 26.4%

Arsenal - Chelsea
Arsenal: 49.1%
Chelsea: 25.7%
Draw: 25.2%

Monday, November 23, 2009

Liga Update

This week is obviously a big week for the Spanish League with the Clásico coming up on Sunday. The two biggest stories there are that Barcelona were held to a draw, putting Real Madrid on top going into their match, and that Cristiano Ronaldo is expected to play in the Champions League in the midweek and then in the Clásico. Other than Athletic de Bilbao holding Barcelona to a 1-1 draw, there weren't any surprises.

Here's the rankings table:

Premier League Rankings Update

Tottenham threw a nice wrench into things with a 9-goal win. That's unusual enough that I suspect my models will overrate Spurs by a bit for the rest of the year. That will be reduced over time as the total number of matches goes up but I don't think it will go away entirely. After getting just a draw, Liverpool have just about completely left the title race. I thought the big four would all be in it 6 weeks from now so I'll just call that prediction dead. With Arsenal hosting Chelsea this weekend, it could go anywhere from a wide-open three-club race to Chelsea in great position well clear.

Thursday, November 19, 2009

A(nother) Request for Feedback and Suggestions

This blog has gotten a decent number of new readers so I thought I'd once again ask for feedback and suggestions.

Do you like the weekly rankings articles? Hate them? Want me to write something on a league I haven't yet?

How about the more general articles like the stats series? The weekly previews?

Any feedback, positive or negative would be appreciated. Please leave a comment below.

Thoughts on the World Cup Draw

Finally, and in typically controversial fashion, we have our 32 teams. FIFA have yet to announce what the pots will be for the draw taking place December 4th. Last time around they announced them on December 6th and had the draw December 9th so I don't think we can expect that for a fortnight or so.

I have written before that I believe that the seeded teams will be Brazil, Italy, Spain, England, Germany, Argentina, France and South Africa. This is based on the formula that they used last time. They may change the formula, they threw a wrench into the UEFA playoffs by making the draw seeded shortly before it happened, but my guess is that those will be the seeded teams.

That leaves these teams:

UEFA:
Denmark
Greece
Netherlands
Portugal
Serbia
Slovakia
Slovenia
Switzerland

CAF:
Algeria
Cameroon
Ghana
Ivory Coast
Nigeria

AFC:
Australia
Japan
North Korea
South Korea

CONCACAF:
Honduras
Mexico
United States

OCEANIA:
New Zealand

CONMEBOL:
Chile
Paraguay
Uruguay

Unless they completely blow up the format they've used for the last several World Cups, two pots will be the seeded teams and the non-seeded UEFA countries. For the other two, they have a few different ways to go and they've done them all before. They could go:
Pot 3: Africa and North America
Pot 4: Asia, South America and New Zealand

Pot 3: Asia, North America and New Zealand
Pot 4: Africa and South America

Pot 3: North America, South America, New Zealand
Pot 4: Africa and Asia

That last option creates uneven pots, but that can be dealt with in the draw and has happened before. In my view, the toughest group of countries by far when you take out the seeded teams and Europe, both of which have their own pots, is those from Africa. I think that's often true anyway, but even more so with the finals taking place in South Africa. So if you are a fan of one of the above countries then you want whatever configuration puts Africa in your country's pot. It may be possible to draw a team from your pot because of the rule that two countries from the same confederation can't be in the same group, but you are more likely to draw one from the other.

Getting back to the seeded teams, I personally put them into 4 groups. That's a big number since there are 8 of them but here's how I rate them:

Favorites: Brazil and Spain
2nd Tier: Germany, Italy, England
Your Guess Is as Good as Mine: France, Argentina
Please: South Africa

I have them sorted best to worst within. The first two groups are pretty self explanatory. For the third group, I think both of those countries are in an interesting situation where they have the talent to be up there, but due to bad management or something else they are playing far enough below their potential that they are not very good right now. The World Cup is still over 200 days away (and Hiddink is available!) so a lot can happen. If France and Argentina played to their potential I would put them up there with Germany, perhaps those three in the 2nd tier and drop England and Italy to their own. I may get heat from English supporters and possibly be underrating them, but I feel they lack the quality and especially the depth of the six other big teams that are likely to be seeded. I would put them ahead of France and Argentina right now, obviously. You will hear shouts of joy from the unseeded countries if they get drawn with South Africa who challenge the United States in '94 for worst host ever.

Looking at the unseeded teams, the countries to avoid are the Netherlands, Portugal and Ivory Coast. I think that these teams have by far the best chance of the unseeded teams to win it, as good or better a shot than England. Other than that, the teams to avoid depend on what the pots are. Along with those three, Mexico, the United States and South Korea are all likely teams to be in the group of death simply because they are likely to be better than the other teams in their pot. Again this depends on the format of the draw; it's especially true if the North American and/or Asian teams are put into a different pot as the African teams. Keep in mind that this doesn't mean they'll get a tougher draw. For example, Honduras is quite unlikely to be in the group of death because they aren't very good. However, the US and Mexico have the same chance of drawing any given 3 opponents as Honduras.

Tuesday, November 17, 2009

UEFA Playoff Predictions

I'm working on a couple other things so I'll keep this short. Here are the predictions for the playoffs tomorrow. These are probabilities of advancing according to my models. As I said before, using logit is less sensitive to quality of the two teams. It seems too much so but I'm not sure so I'm giving both the logit and poisson versions of the averages model. I'll list the logit version first and the Poisson second.

France - Ireland:
France - 83.3%, 92.2%
Ireland - 16.7%, 7.8%

Bosnia-Herzegovina - Portugal:
BH - 22.6%, 17.9%
Portugal - 77.4%, 82.1%

Ukraine - Greece:
Ukraine - 49.2%, 48.2%
Greece - 50.8%, 51.8%

Slovenia - Russia:
Slovenia - 30.8%, 26.2%
Russia - 69.2%, 73.8%

Thursday, November 12, 2009

WQC UEFA Playoff Preview: Ireland - France

Will Irish eyes be smiling? Will the French come out of next week with their joie de vivre? Is that a cheesy enough intro? No? In an epic struggle between Guinness and champagne, eleven footballers from each these two great republics will battle it out to see who books a trip to South Africa and who weeps in front of the television next summer. Will France be able to overcome the luck o' the Irish or will Ireland best France despite them having a certain je ne sais quoi? That should do.

How they did in the group stage.

Making the playoff always means the same thing. You did pretty well in the group stage, but not well enough. Unlike Ireland, France were expected to win their group. They went 6-3-1 with a goal differential of +9. The other teams in their group were Serbia, Austria, Lithuania, Romania and the Faroe Islands. I'd say that's an average to a bit above average group. Serbia bested them despite France getting a win and a draw against the Serbs. The French gave away points with an early loss at Austria and two draws against Romania, the last of which was two months ago in Saint-Denis.

Looking at results, Ireland are probably the most interesting team remaining. They are the only country in the playoffs that didn't lose in the group stage; their problem is that they got far too many draws. Their group mates were Italy, Bulgaria, Cyprus, Montenegro and Georgia. Again I'd rate that as a pretty average group. Playing Italy to two draws is certainly nothing to be ashamed of. Less impressive are the two 0-0 draws against Montenegro and 1-1 draws against Bulgaria. In terms of goal differential, Ireland weren't impressive. Their four wins over lowly Georgia and Cyprus were all by a single goal. I don't recall another team coming out of a group stage like that with every match either a draw or just with one goal in it. Based on that we can expect a couple of close matches.

Likely Lineups

France have some big injuries. Franck Ribery will be out with a knee injury. Jeremy Toulalan is also expected to miss out on one or both of the matches with a tweaked adductor muscle. In addition, Abou Diaby will probably be ready but has had injury issues. On the other side, Ireland have few injuries. They will be without backup forwards Shane Long, Noel Hunt and Caleb Folan.

One issue many have with Domenech is that the squad has lacked consistency in both lineups and tactics. France have run a 4-4-2, 4-3-3 and the somewhat in-between 4-2-3-1. Ireland on the other hand have kept to the traditional 4-4-2.

I think the lineups will look something like:
Ireland
...................Given

Finnan ...O'Shea ...Dunne ...Kilbane

Keogh...Whelan ...Andrews ...McGeady

............Keane... Doyle

................Gignac
Henry .........................Anelka
...............Gourcuff
...........Alou....Lass

Evra.....Abidal...Gallas...Sagna

.................Lloris

edit: I've changed the France lineup from my initial idea. For some reason I had Gallas out of the starting lineup despite being the most regular center back in qualifying. I also apparently got the replacement for Toulalan wrong. According to an article in Lequipe, Alou Diarra is the likely starter instead of Sissoko and the center-back pairing will be Abidal and Gallas. At least that's what my quite limited knowledge of French tells me the article says. They also list Gignac up top instead of Benzema. I could see that going either way and I think whoever doesn't start is a likely sub for whomever does.

How they rate.

I applied my new rankings system to all UEFA countries. Fully weighting all matches in qualifying and the finals for Euro 2008 and this qualifying campaign, France came out as the 15th best scoring team and 13th best defensively. I was surprised they were that low to be honest. Overall they were 10th indicating that they weren't far below the teams ahead of them in either attack or defense. Ireland came out 26th best overall. They rate 18th best defensively but only 30th best at scoring.

Predictions

I'm still unsure which model is best. I'm using the new averages model to come up with the coefficients and then either using Poisson or ordered logit to get the estimates for how likely each possible result is. As I've written before, the logit model is less sensitive to the scoring and defensive factors of the two teams. The results of each model are somewhat close but using Poisson gives a more extreme result.

For the tie as a whole, the logit model predicts that France will advance 62.3% of the time and Ireland 37.7%. The Poisson model gives France a better chance with a 73.1% shot at playing in South Africa, 26.9% for Ireland. For the first leg alone, the averages-logit model has it extremely even with France having a 37.4% chance of winning, Ireland 35.8% with the remaining 26.8% going to a draw. The Poisson model gives France a 40.5% chance of winning outright, Ireland a 30.3% chance of pleasing the home fans and them leaving it all to play for with 29.2% probability. For both models the most likely outcome for the first leg is a 1-1 draw.

Personally, I expect it to be very close. I think Ireland definitely have a chance. Part of that is that I am not a fan of Domenech. My suggested lineup above is what I think they probably run based on what they have done in the past. That lineup should get the job done, but if I were French I'd be worried about both who and what he'll put out there. Ireland should be a tough opponent and I don't see France playing them off the park either leg, especially the first. Despite all that, I think Franced definitely have an edge.

Monday, November 9, 2009

German Bundesliga Rankings

I'll keep this short because I don't know a lot about German teams other than what I've read and what the rankings say:



Bayern Munich have been far from great. Maybe not as bad as their 8th place in the table suggests, but van Gaal has a ways to go to get them back in the title race. Leverkusen look like by far the best team so far.

edit: I just looked at Leverkusen's schedule because I found their huge margin at the top odd. They have played a very tough set of fixtures thus far. Looking at the table, they sit top and have played all teams from 2nd - 7th. Their five matches to round out the first half of the season are against teams that are currently 8th, 10th, 14th, 15th and bottom. While I have yet to see them play, I think it's pretty safe to say that they are likely to extend their 3 point lead if they can just play at something close to the level they have thus far.

French Ligue 1 Rankings and Analysis

I have to admit that I have not seen a single match from the French Ligue this season. Having said that I have seen several matches involving French clubs in European competitions. I watch their top clubs in Europe a lot because I like their style of play and find them entertaining. Other than Spanish, I think French teams are my favorite to watch. I definitely plan on seeing more domestic matches as well as the season moves along.

England and Spain look to have 3 teams with a decent shot at competing for the league. Italy really has one but I'll be generous to Juventus and say it's a two-horse race. In contrast, the Ligue 1 is wide open. 12 matchdays into the season there are 4 clubs within 3 points of the leaders and 4 more within another 3 points. That is amazing and there's nothing close in the other top leagues. Other than lowly Grenoble, the league is pretty competitive top to bottom.

Here are the rankings:


O Rank - rank by goal-scoring ability
D Rank - rank by ability to prevent goals
EGD - expected goal differential if they played a new season and all teams played at level shown by results thus far

Bordeaux third shows why especially at points this early in the season it is handy to have a model around. They sit on top of the table and have the highest goal differential. My rankings are based on goal differential, but, importantly, take schedule into account. Bordeaux have played an incredibly easy schedule. Just looking at the table, in just 12 matches they have played every team in the bottom 7. Adding in their match with Nice, two-thirds of their matches have been against teams in the bottom half of the table. I'm not sure a team has ever played an easier schedule this far into the season when you look at how where their opponents are in the table.

Other than that, a look at the third column confirms my suspicion that there isn't a lot in it at the top. There are 7 other teams within 10 goals in the rating system! Things are wide open. Other than Bordeaux which I discussed above, the other standout in terms of where they are in my rankings compared to the table is Paris Saint-Germain. I think they are 12th in the table for a few reasons. The obvious one is that they still have a match in hand. They also have played a somewhat tougher schedule than average (incredibly tough compared to what Bordeaux have dealt with). In addition to all of that they have had slightly below-expectation luck when it comes to getting points for a team with their goal differential. I expect Les Parisiens to make a surge and at least get back into the fight for the European positions.

Italian Serie A Rankings

This season I've seriously neglected the Serie A. This is somewhat for good reason because the league is shaping up to be a one-horse race. Inter have just looked far and away the best club. Whether you look at points, they have 29 of 36, or goal differential, +19 for an edge of over 1.5 goals per match, they have been impressive while other teams thought to be contenders have been inconsistent. Behind them, Juventus have gotten decent results but seem too shaky, particularly at the back, to put too much pressure on Mourinho's boys. AC Milan have recovered from their early struggles and now sit third, but other than their Champions League match in the Bernabeu they've looked a lot more like old AC Milan than the AC Milan of old. Napoli were considered by many, including me, to potentially fight for the Scudetto but they have struggled and are midtable 11 points off the lead.

It's early still, too early to declare things over but I've seen nothing in watching matches or analyzing results to suggest anything other than Inter running away with it.

Here are the rankings:


O rank - rank by goal-scoring ability
D rank - rank by goal-conceding ability
EGD - expected goal difference if all teams played a full season at the level the results thus far have shown

Other than Inter rating nearly 30 goals better than any other club, the first thing I noticed is that I have AC Milan seventh despite them being third in the table. I believe this to be due to them having had better than average luck when it comes to close-match results. In contrast, Bari are 2 goals better in goal differential but 4 points further back.

Spanish Liga Update and Rankings

Unfortunately I managed to miss last week with the Spanish league update. Last week was more interesting than this week with Barcelona held to a draw against Osasuna. All went to plan this week. I still think the league looks a lot like last season, but it's looking more open than before. Last season at this point Barcelona had 25 points, Real Madrid 23 and Sevilla 20. Currently Barcelona sit top with 26, Real Madrid have 25 and Sevilla 22. So it's a bit closer. Depor and Valencia continue to look very good as well. A team continuing to not look good is Atletico de Madrid. Atleti find themselves in the relegation zone and already eliminated from the Champions League with two matchdays to go. Their losses in the derbi and the week before in Bilbao did little to inspire confidence and bring calm to the chaos.

Here are the rankings, using my new ranking system:


O Rank - rank by goal-scoring strength
D Rank - rank by goal-conceding strength
EGD - expected goal difference if they played a full season at the level the results so far indicate

Barcelona rates about five goals better than Real Madrid when it comes to scoring and about half a goal behind Sevilla defensively. In other words the results so far indicate that they are as dominant as ever. Real Madrid to their credit are looking better than usual defensively. Looking closer to the bottom, Atletico de Madrid are looking awful with an expected goal differential under -24. These numbers indicate that Villarreal and Malaga have been running below expectation in luck since their places in the table are significantly lower than what I have here.

EPL Rankings and Update - 9 November

Chelsea picked up a gritty win at home against Manchester United to go five clear, though Arsenal still have a match in hand. Earlier today Liverpool only managed a draw at Anfield against Birmingham. I am going to claim defeat in my prediction that the big four would stay in the race through the end of the year. I was a full two months off as Liverpool are now 11 points back and out of it already. Man City also got a disappointing result with a 3-3 draw at home against Burnley, the Clarets first points away from home this season. Villa and Spurs both picked up expected wins to continue their fight for a spot in Europe.

Here are the rankings, using my new averages model:



The Poisson rankings have a few differences. In offensive ranking, the Poisson model flips Liverpool and Chelsea. Defensively it flips Arsenal and Man City, leaving Spurs in between. Overall the Poisson model puts Man City one spot below Tottenham instead of above. Nearer the bottom of the list, the Poisson model has Burnley 15th, Blackburn 16th, Portsmouth 17th and Bolton 18th. Otherwise they are the same in terms of ordering.

New Rankings and Predictions System

I've been working on a new model which I think helps with some of the problems the Poisson model has. It is based on work done at Smart Football Rankings, an effort to develop a rankings and prediction system for college (American) football. I'll call this the averages model.

The idea is this: instead of making assumptions about the distribution of goals, let's just look at how well each team does at scoring and conceding goals compared to their opponents. There are two stages. In the first, I calculate a scoring and defensive factor by taking the difference between a team's goals for (also goals against but I'll just talk about the scoring half for this) in each match and how many goals on average were conceded by the opponent in their matches against other teams. To account for home-ground advantage, I adjust the average up or down for the match based on whether the team is at home or away. The adjustment is simply the difference between the league average goals and the average goals scored by home teams only. For example, Manchester United had given up 1 goal per match before facing Chelsea. Home teams on average score roughly 0.29 more goals per match than away teams. Chelsea scored one goal so their goals-for score from their match with Manchester United is 1 - (1 + 0.29) = -0.29. For Manchester United, Chelsea had conceded 8 goals in 11 matches for an average of 0.727 goals per match. Since United were playing away and failed to score, they would get 0 - (0.727 - 0.29) = 0.437. For the first step, this would be done for every team in each match. A team's scoring factor is then the average of these for each match played.

The second step adds a level. The first step compares your team's scoring to that of the average opponent of your opponent. If the teams you play have played an easy schedule themselves then your extra goals look less impressive. A way to control for this is instead of using average goals against use the scores calculated from step one along with the average goals for and against for the league as a whole. Getting back to the Chelsea - Manchester United example, Manchester United's defensive score from the first step is -0.689. In other words, they've given up about 2/3 of a goal per match less on average compared to what their opponents have scored against other opponents. The league average for the league as a whole is 1.52 goals per match, and home teams 1.80. Chelsea scored 1 goal against United so they get 1 - (-0.689 + 1.80) = -0.111. Chelsea's defensive score from step 1 is -0.83 and away teams average 1.23 goals per match so Manchester United got 0 - (-0.83 + 1.23) = -0.4. Each team's scoring and defensive factor is the average of of these for all matches played.

There are two reasons I like this system better than the Poisson. The first is that it has gotten better results in some testing. I've done something very similar to the PLM where I use ordered logit on the expected goals for each team according to the model and its predictions have been better. If there is interest I'll post more detailed work there but I used two different scoring systems. One just looks at squared difference between predicted probabilities and what actually happened, assigning 1 if the outcome (home win, away win, draw) happened. The other was a betting system where I looked at how much would be made by the PLM using odds given by the estimates of the averages model and the other way around. The averages model outperformed the PLM in both of these tests. Beyond just these things, I think the rankings make more sense when I look at where it rates teams.

The second reason I like the averages model better is that it I think using sums is more accurate than using products. In the Poisson model, home-ground advantage is given by multiplying the expected goals for home team by a number between 1.1 and 1.5 for most competitions. Similarly, the expected goals in a match for a team is their scoring factor times the opponent's defensive factor. A result of that is that high-scoring teams are more sensitive to the quality of their opponents and playing at home than low-scoring teams. This doesn't seem to be reflected in reality. I'll write more on this later, but for better teams playing at home tends to be a bit less important. The averages model assumes it equally important for all teams so that's an improvement. Note that other than including home-ground advantage, a major difference between mine and the methodology used by Smart Football Rankings is that they actually use products instead of sums.

For at least the next few weeks I'll include both Poisson/PLM and the averages model when giving rankings and predictions. Because I believe the averages model, and its logit, to be superior I'm using it as my main model until I work out a better one.

Friday, November 6, 2009

Other EPL Predictions

Here are the numbers for the other matches. Keep in mind that the model does not take into account injuries and only uses goals so these numbers are just a rough guide. As I've said several times, I think the model overrates Arsenal particularly as they've been running white hot when it comes to scoring goals.

Aston Villa - Bolton
Villa - 66%
Draw - 22%
Bolton - 12%

Blackburn - Portsmouth
Blackburn - 48%
Draw - 29%
Portsmouth - 23%

Man City - Burnley
Man City - 74%
Draw - 18%
Burnley - 8%

Tottenham Hotspur - Sunderland
Spurs - 63%
Draw - 22%
Sunderland - 15%

Wolverhampton - Arsenal
Wolves - 11%
Draw - 20%
Arsenal - 69%

Hull City - Stoke City
Hull - 34%
Draw - 31%
Stoke - 35%

West Ham - Everton
West Ham - 57%
Draw - 25%
Everton - 18%

Wigan - Fulham
Wigan - 38%
Draw - 30%
Fulham - 32%

Liverpool - Birmingham
Liverpool - 62%
Draw - 24%
Birmingham - 14%

Weekend Preview: Chelsea - Manchester United

The match between the top two teams in the Premiership kicks off at 16:00 local. For the Americans, it can be found at 11 AM Eastern on the Fox Soccer Channel.

History

These two teams have won the last 5 English Premier League titles. In the last six seasons both have been in the top 3 and three of those times they finished in first and second. While Arsenal and to a lesser extent Liverpool and Man City are thought to be contenders this season, the champion this year will most likely be one of these two clubs.

Looking at their head-to-head results, going further back is even more pointless than usual due to the recent influx of quality at Chelsea when Abramovich took over. I'll give it to you anyway: in league play Chelsea have beaten Manchester United 37 times, Man U have bettered Chelsea 56 times and 41 times they played to a draw. Since Abramovich took over in the summer of 2003, Chelsea have 5 wins, 4 draws and 3 losses against Manchester United in league play. At Stamford Bridge United have not won since the Abramovich takeover; Chelsea have 4 wins and 2 draws in league matches, throw on an extra win and a draw if you want to include cup play. The last time United won at Stamford Bridge was the 2001-2002 season.

Form

Usually when these two clubs play it's a given that they've won 4 or 5 of their last 5 matches but that's not the case this week. Chelsea are 3-0-2 in their last 5 and Manchester United are 3-1-1. Chelsea's losses were 3-1 at Wigan and 2-1 at Aston Villa. You don't expect any team to win them all, but Chelsea fans are surely disappointed with those results. The loss to Villa isn't so bad, they'd probably feel ok with a draw there, but losing by two goals to a team that figures to be midtable at best and probably in the relegation fight is. If you think that Manchester United's loss was 2-0 at Anfield and their draw was 2-2 at home against Sunderland. There they equalized in extra time on an own goal.

An interesting thing is that there is a big divide between home and away form for these teams. Chelsea have won all 5 of their home matches so far this season. After edging out Hull 2-1 in their opener, they have been on fire beating Burnley then Spurs by 3 goals, Liverpool by 2 and then Blackburn 5-0. They have scored 15 goals and only conceded 1 in their 5 home games. Away from home Chelsea have 4 wins and those 2 losses mentioned above. Similarly, Manchester United are 4-1-0 at home with the only blemish that 2-2 draw versus Sunderland. On the road they are a less impressive 3-0-2 with losses to Liverpool and Burnley. We're talking about quite small samples of 5 and 6 matches, but if home and away form mean anything it points to an edge for Chelsea.

Injuries

Chelsea are relatively free of injury. Mikel and Zhirkov are expected to play but have ankle and knee injuries respectively. Bosingwa is the only player likely to be unavailable. The same cannot be said for Manchester United who have several players that either can't play or won't be fully fit. In the first column, it appears that Rio Ferdinand will not play due to a nagging calf injury. Park Ji-Sung and Hargreaves are also likely to miss out. Someone we know won't play is Gary Neville due to suspension. On the brighter side, Vidic is expected to be able to play. Fletcher still has an ankle injury but says he can play with an injection.

Model Rankings

In my rankings Chelsea are second and Manchester United third. Despite being in positions next to each other, the model actually has them a fair bit apart at nearly 22 goals of goal differential. Most of that difference is at the attacking end. Not to the same extent as Arsenal, but Chelsea seem to have been running hot at scoring so far. They are on pace for 96 goals, the model says they would score about 91 at this pace because they've played a slightly easier than average schedule. Last season they only scored 68. I think that Manchester United have also been running above their scoring expectation as the model says they'll average around 76 goals playing at this level and they also only scored 68 last year. I think both will cool off and it's too early to say for sure, but Chelsea have certainly looked better in attack than Manchester United. This shouldn't be a huge surprise given the sale of their best attacking player over the summer.

On the defensive side of things, the model puts Chelsea on pace to concede 28 and Manchester United 35. Last year they both conceded 24. Manchester United have had some surprising defensive lapses this season. I've said repeatedly that over the last couple years I think they have had the best defense in football but they've not looked like it this year. Something I wonder is how much of it has to do with losing Ronaldo. Ronaldo didn't defend much, but his ability to go after the other team certainly made it more risky to attack the United goal. That doesn't cause things like Rio Ferdinand handing Man City an equalizer in the 90th minute, but I think it does play some role.

Predictions

Like before, I'm using the PLM to give the result prediction and the Poisson model to say the most likely scoreline. The models give Chelsea a surprisingly big edge, especially when you consider that they don't take injury into account. They say the Blues win just over 55% of the time, United 18% of the time with the remaining 27% being a draw. The most likely scoreline is 1-0 with a 13.5% chance. Next is 2-0 (11.9%), 1-1 (10.8%) and 2-1 (9.6%). I think the injuries become too much to overcome and Chelsea win 2-0. We'll see if I get the exact scoreline for the first time.

Monday, November 2, 2009

More on Corners (Stats Series)

In my previous article I looked at corner kicks. I was surprised to find that there was little to no correlation between the difference between the number of corners for the home and away side in a match and the goal difference in the match. In fact, there was some evidence that there might even be a reverse effect because in matches where the home side won, they got fewer corners on average.

A lesson in variance

Looking into it deeper, there was a problem with some of the results in the previous article: I was not careful enough when looking at variance. I assumed that with those sample sizes the standard deviations would be pretty low so things would look statistically different. That didn't turn out to be the case for one of the results. The reason for this is that the standard deviation for corners is much higher than goals. If you think about it, this makes sense. Some matches your favorite team will get no corners and they might get well over 10 the next time out. Goals are much tighter - 0 to 3 for most matches with the odd 5 or 6-goal performance thrown in there.

As a result of this, the table near the bottom (just above "What is going on here?") is effectively meaningless. There is no statistical difference between the corner differential when the home team wins by 2 as when they win by 1. In other words, while the averages indicate that teams get fewer corners in the more goals they win by, there is a too strong a chance that this just happened in the sample due to randomness so we can't say that the relationship holds.

Having said that, other surprising results are valid. Firstly, the home team on average gets more corners than the away team in any type of match (home win, draw, away win). Furthermore, in matches where the home teams win, the difference in corners is smaller than it is in matches where there is either a draw or the home team loses. So from this we can conclude that when the home side wins, they tend to get fewer corners compared to their opponent. The difference is roughly three quarters of one corner. One claim that can't be verified without getting more data is that the home side gets significantly more corners compared to the away side in matches that end in a draw than those that end in an away win.

In summary, while the other results do not meet the statistical significance test, we can conclude that home teams tend to get more corners than away teams no matter the result of the match and that the difference in home and away corners is smaller in matches where the home team wins than those where the away side gets a result.

Comparing Teams

I decided to look further into it by taking a look at how different teams do when it comes to corners. I would not have found the results surprising before writing the first article. The short of it is that conventional wisdom seems to hold and good teams get more corners over a season than bad teams.

The data is from the last four seasons in the English, Spanish and Italian top flight. For each team I calculated their final tallies in wins, draws, losses, goals for, goals against, goal differential, corners for, corners against and corner differential. I also calculated the average number of corners for and against in matches where the given team won, drew or lost.

I'll start with overall correlations. Looking at goal differential and corner differential, the correlation between the two is 0.58; there is a strong, positive, correlation. In other words, teams that won more corners than their opponents over the course of the season tended to also score more goals than they conceded. The correlations for goals scored and goals conceded and their corresponding corner stats are similar. Both are right around 0.45. So we have what we would expect - teams that get more corners tend to get more goals and those giving away more corners also tend to concede more goals.

Here's a scatter plot with corner differential and goal differential along with the linear regression line.



As you can see, there is definitely a positive relationship. It's not as strong as goal differential and points, but it's certainly there. There are two decent outliers. The one in the upper left is Real Madrid two seasons ago. The Madridistas won the Spanish league and had the league's best goal differential with 48 more goals scored than allowed. Despite that, they won 164 corners and gave up 237. The one on the bottom is Derby County from that same season. The Rams finished with an "impressive" record of 1 win, 8 draws and 29 losses. They scored 20 goals and conceded 89 for a goal differential of -69. Despite that, they "only" allowed 79 more corners than they got. For comparison, Manchester City that season gave up 88 more corners than they got and had a goal differential of -8.

Looking at the graph, it seems to curve up toward the end as far more observations are above the regression line than below. To improve the fit, I ran a regression including a term that is the square of corner differential and got a much better fit as you can see:



Using these results, it depends on where a team is, but an extra corner is worth about an extra quarter of a goal. For better teams it's even more valuable. This is because teams that score more goals get fewer corners per goal. Here's a plot of that:



I find this interesting. It's far from perfect, but corners are a decent representation of attacking chances. Thought of in that way, I would argue that this relationship suggests that teams that score a lot of goals do so not only because they get more attacking opportunities, but that they also convert a higher percentage of those chances. That's not too surprising, strikers certainly get paid to both create and convert on goal-scoring opportunities. I think it's interesting though that the data supports the idea that good attacking teams are more efficient at taking advantage of chances and that it's not simply getting more that leads to more goals.

What about defense? I won't post the scatter plot, but it is essentially the same for goals conceded; teams that concede a lot of goals give up more goals per corner conceded. I would argue that this suggests that teams that are bad defensively not only allow more chances, but they also allow chances that are better on average.

Again, corners aren't a perfect representation of scoring chances. Something like "times with the ball in the attacking third" would be better but it isn't recorded. Sometimes "scoring chances" is given on air during a match, but as far as I know it's never listed as a stat. A problem with scoring chances in general is that it's subjective. The use of the term "half chance" is common and one guy's chance is another guy's half chance and vice-versa. In my view, corner kicks are the best objective method available to measure this.

Viewed thusly, the stats suggest a nice synergy between defense, midfield and attacking players. Team strength is usually pretty similar in all areas. If you'll forgive me for simplifying, midfielders are responsible for both creating attacking opportunities for the team and preventing them for their opponents. Forwards are responsible for converting those chances and defenders for keeping their opponents from doing the same. Good teams tend to have midfielders that create a lot of opportunities for their forwards. These chances will tend to be better than those created by worse teams as well and as a double whammy the forwards on these good teams are better at putting them away. Similarly, good defensive teams have strong midfielders that don't allow a lot of opportunities to score and the defenders take care of business by allowing just a small percentage of these opportunities to be put in.

What about the previous article?

The previous article suggested that there was little to no relationship between the scoreline of a match and the number of corners for each team. This article suggests that good teams get more corners than bad teams. How can that be so? I think the reason gets back to variance. In a single match, anything can happen. That's true for results, I don't need to list big upsets. For corners the variance is even larger. So from one match we can't really conclude much of anything from corner kicks, but over a season there is enough time for things to even out.

As far as home losses leading to more home corners compared to away corners than other results my best guess is that it's selection. Home wins and draws are going to have a lot more cases where an inferior team is ahead or tied and playing 11 men behind the ball against a superior opponent. That situation probably leads to more corners than any other. As long as the better team keeps getting unlucky they're likely to tally a lot of them making the corner difference very small. That is the only explanation I have come up with, I'd love to hear your idea if you have another. Please leave a comment.

Conclusion and Future Work

In my previous work on goal differential, I made the case that conventional wisdom is wrong - there is no evidence that performance in close matches is itself a skill apart from the ability to score and prevent goals and some evidence that it all comes down to luck. In this case though, using full-season data for each team I'm arguing for the common view that good teams are not only better at scoring and defending but more efficient in doing so as they convert a higher percentage of their opportunities and concede on a lower percentage of opportunities they allow their opponents to have.

In the future I may try to use this idea of corners as a proxy for chances to assess goalkeepers and forwards. It's one data point, but I think the Real Madrid outlier above is evidence for the fantastic play of Iker Casillas. I certainly don't think it's the be-all-end-all of stats but corners/goal scored or conceded serve as some measure of how well a team's forwards or defenders and goalkeeper played.

Premier League Rankings - 2 November

I'm going to go quite short this week and mainly just post the rankings. I'm working on a follow-up article on corners and I want to focus on that.



Not a lot of changes this week as the outcomes weren't far off from the expectations of the model. Liverpool dropped a spot while Fulham moved up 2 when Fulham got the 3-1 win at Craven Cottage. Liverpool are certainly looking like the favorite to throw off my prediction that the big four would all be in the race at the new-year break. At the other end of the table, the biggest mover was Portsmouth moving up 5 spots and 15 expected goals in goal difference due to stuffing Wigan 4-0. The (wait, do Portsmouth have a nickname?) are now just 3 points from getting out of the relegation zone and that battle looks like it could be good this year. With Portsmouth's attack moving out of the cellar, Hull have a stranglehold on the bottom position in the rankings as they rate the worst both scoring and defending. It's pretty telling that Burnley moved down in the rankings after beating them 2-0. In fairness, that's due to other results since Burnley's expected goal differential actually went up slightly.