Monday, August 31, 2009

World Cup Qualifying - CONMEBOL

How It Works

All 10 teams play under the league format. Each team plays every other team twice, once in each country. The top four teams automatically advance to the finals in South Africa, the fifth-place team has to win a playoff against the fourth-place team from CONCACAF.


Where the Group Is Now




Brazil are sitting pretty at the top, 7 points clear of the playoff spot. Chile are doing quite well just one point behind. Paraguay are also in great shape, four points clear of fifth. Argentina do not look as good as usual and sit just two points ahead of fifth-place Ecuador going into their match with Brazil this weekend. At the bottom, Bolivia and Peru are effectively out of it while Colombia and Venezuela all have a shot but are running out of time with only four matches to go. Uruguay should challenge Ecuador for that fifth spot and could even catch Argentina.

Peru and Bolivia have been significantly worse this qualifying campaign than the bottom teams have historically been - with the exception of Venezuela before France '98 with just 3 points. Four years ago at this point both were the bottom two teams as they are now, but in the other order. Bolivia had 13 points and Peru 14. As a result, it looks like more points will be required this year to make the finals. Four years ago at this point, there was a 3-way tie for fifth. Colombia, Chile and Uruguay all had 17 points. Right now fifth is a full 3 points higher. Argentina haven't looked nearly as good as usual and Chile have given us a much better showing than they historically have. Otherwise there's nothing too strange going on.


Poisson Predictions


I put the qualifying results into the Poisson model. I also ran a version where I included the matches from Copa America, which took place in 2007. It didn't change things much. Putting those results in made Ecuador look a bit better and Uruguay a bit worse. In qualification chances, it bumped Ecuador up about 7%, decreased Uruguay 6.2% and also reduced Colombia's chances 0.7%. Other teams changed only slightly.

Here is the chart with just the qualification results:



The format is similar to what I used in the CONCACAF article. The first numbered column gives the percentage chance of finishing in the top 4, which automatically qualifies you for South Africa 2010. The second column gives the chance of finishing in fifth place, which puts you in a playoff against the 4th-best team from CONCACAF. The out column gives the chance of finishing 6th or worse - this eliminates you from the competition. The last column gives the chance of qualification if a potential playoff with the CONCACAF team would be a 50/50 proposition. As I said in the earlier article, I think this underestimates the chances a bit because whichever team finishes fifth here is probably a small favorite over the CONCACAF team finishing fourth.

As you can see, Brazil, Chile and Paraguay are effectively in and Bolivia and Peru are effectively out. In 100,000 simulations, Brazil finished in 5th 8 times and the other 99,992 times they were in the top four. I'd say they're pretty safe. Similarly, Peru didn't finish in the top 5 for even one of the 100,000 simulations.

Things are much more interesting in the middle. Argentina are still favored to get in, but they're certainly no lock. Of the four other contenders, Uruguay rates the best. This is likely due to their schedule - they play at Peru this weekend and face rivals Colombia and Argentina at home. Their toughest test is a trip to Ecuador. Meanwhile, Argentina have Brazil left on their schedule and Ecuador wrap things up with a trip to Chile. That likely means that the model will underrate Ecuador's chances because most likely Chile will have nothing to play for in that match and a win for Ecuador wouldn't be unlikely in the event that they need one, even though they are the worse team and playing away.

Saturday's Matches


Argentina - Brazil


Importance for Argentina:
A win would be a huge boost to their chances, and even getting a single point would help. With a win, Argentina go from the 53% chance they currently have to automatically qualify to 78%. Overall their chances go up to 88% to make it to South Africa if you assume they'd be 50/50 but it's probably more like 93% since they'd be a solid favorite in a playoff. So a win gives them about a 10% better chance of making it. A draw leaves their numbers pretty much unchanged. A loss would be bad though, and the likelihood that they finish in the top 4 would drop 14 percentage points to 39%. They would lose about 10% for their overall qualification chances.

Importance for Brazil:
Effectively none. They're in either way.

Colombia - Ecuador

Importance for Colombia:
Colombia's chances are quite slim either way. A win makes them about twice as likely to finish in fifth, which about doubles their chances of qualifying to a little over 4%. If they lose they are effectively out. A draw roughly cuts their chances in half.

Importance for Ecuador:
This match for Ecuador is pretty important. A loss reduces their chances of finishing in the top 4 by about 5.5% and their overall chances by about 10%. A draw keeps them pretty much where they are, they only lose a couple percentage points. A win would be a huge boost. If they win then their chances of a top 4 finish roughly triple to about 21%. Overall they would be about twice as likely to qualify going from 19% to 38%.

Peru - Uruguay

Importance for Peru:
None. They're out.

Importance for Uruguay:
This is close to a must-win for Uruguay. With a win they are in pretty good shape - probably in fifth place, possibly just a point behind Argentina in fourth. Their chances of finishing in the top 4 go up 10 points to 49% and they'd become about 2:1 favorites overall to qualify with a win. A draw would reduce their chances to 27% for the top 4 and about 44% overall. A loss would be devastating and they would drop to just a 15% shot at going automatically and 32% overall.

Paraguay - Bolivia
Chile - Venezuela

Both of these matches feature a team that is virtually certain to make it and one that is just as likely to finish outside the top 5. Chile and Paraguay would be pretty much unaffected even by a loss. If Venezuela pull the upset then they would go from practically no shot to having a small chance - about 8%.

La Liga "Preview"

I had intended to start this blog a bit sooner and include a preview of the Spanish league. Unfortunately, it started without me. Since Barcelona and Sporting Gijon don't play until tonight, I'll still call it a preview.

On Looking Backward with the Poisson Model

Something I'm doing in this article and will likely do in the future is applying the Poisson model to past results to analyze the probability of various events having happened in the past. This is a bit weird because in some sense the probability of what happened happening is 1 and 0 is the probability of everything that didn't happen having happened. What I'm doing is taking the results, if using the full season then this essentially boils down to the goals for and against for each team, and asking "If this league were run millions of times in a parallel universe where each team plays at the level reflected in their number of goals scored and conceded, how often would they finish this way?". Put another way, I'm looking at how often a team scoring X goals in a season and allowing Y goals to be scored would finish in a certain spot in a league where the other teams have their given number of goals scored and allowed.

This is related to what is called Pythagorean Expectation in baseball and other sports. The idea there is to derive a formula for win percentage based on the number of runs a team scores and allows. For other sports, this stat is used to analyze how lucky a team was. The argument is that in baseball, while luck plays a part, scoring runs and preventing runs from being scored are largely based on skill - at least if you have a sufficiently large number of games played. On the other hand, scoring them at convenient times is much more based on luck than skill. So if a team tends to win a lot of blowouts and lose a lot of close games then this is unlucky and they will have a record that is worse than it should be given their skill level. There is ample evidence in baseball, and I believe other sports, that run differential (runs scored minus runs allowed, think of it like goal difference) is a better indicator than record of how a team will do the next season. This is yet another thing I want to examine in the future, but I suspect it will be similar in football.

I ran the 2008-2009 results through the Poisson model and simulated 100,000 seasons. The Poisson Expected Points is the average number of points the team got in those simulated seasons. Again, you can think of this as how many league points they would get in an average season where all teams play at the level reflected in the goals scored and allowed by each team last year. Along with Poisson Expected Points, I also recorded how often they finished in each league position at the end of the simulated seasons.

A Brief Review of 2008-2009

2008-2009 was the year of the FC Barcelona. They won the 'triple', taking down the Liga, the Copa del Rey and of course the Champions League. The league they won in convincing fashion, having effectively won it in the 34th matchday by destroying 2nd-place Real Madrid 2-6 in the Bernabeu. They went on to finish 9 points clear. Though the champion was obvious, there were some interesting battles the last few weeks of the season. On the second to last week of the season, Sevilla secured third place, guaranteeing them a spot in the group stage of the Champions League. Fourth place, and a spot in the Champions League playoff round, was in doubt to the end. It was secured by Atletico de Madrid with their 3-0 win over Almeria. This win dropped Villarreal into the UEFA Cup. At the bottom of the table, Huelva and Numancia were relegated with a couple matches to go. The battle for what the Spanish call permanence was fought by Gijon, Osasuna, Valladolid, Getafe and Betis. In the end Betis came up on the short end, only managing a 1-1 draw at home against Valladolid the last week of the season.

Here's a preview of the top six, ordered by their finish last year.

Fútbol Club Barcelona

2008-2009:
Record: 27-6-5, 87 points (1st)
Home Record: 14-3-2, 45 points (1st)
Away Record: 13-3-3, 42 points (1st)
Goals Scored: 105 (1st)
Goals Conceded: 35 (1st)
Poisson Expected Points: 90.2 (1st)
Poisson probability of winning league: 80%

Players Out:
Eto'o - 34 starts, 2 sub appearances, 85 minutes per game played, 30 goals
Gudjohnson - 11 starts, 13 sub apps, 46 minutes per game played, 3 goals
Sylvinho - 10 starts, 5 sub apps, 62 minutes per game played
Hleb - 8 starts, 11 sub apps, 41 minutes per game played
Víctor Sánchez - 3 starts, 4 sub apps, 38 minutes per game played

Players In:
Zlatan Ibrahimovic - F Inter Milan
Dmytro Chygrynskiy - CB Shakhtar Donetsk
Maxwell - LB Inter Milan

Barcelona made one high profile change at the top, swapping out Eto'o for Ibrahimovic. Otherwise they lost a few players that came on regularly as subs and added Chygrynskiy and Maxwell to supplement their back line. Maxwell is a solid option at left back. Chygrynskiy, at 22, is quite capable of contributing now and has the potential to become an elite center back in a few years. Also important given their style of play, Chygrynskiy is very good at passing the ball for his position.

The key is that Barcelona should be very similar this season to what they were last year, a scary proposition for the other 19 clubs in the Primera and the other 31 teams in the Champions League for that matter.

They'll likely start like this when Iniesta is back in full health:



No real surprises there.

My Prediction: Despite the action at Real Madrid, the blaugrana should still be the best club in the Liga and remain favorites to finish top.

Real Madrid Club de Fútbol

2008-2009:
Record: 25-3-10, 78 points (2nd)
Home Record: 14-2-3, 44 points (2nd)
Away Record: 11-1-7, 34 points (3rd)
Goals Scored: 83 (2nd)
Goals Conceded: 52 (7th)
Poisson Expected Points: 70.9 (2nd)
Poisson probability of winning the league: 10%

Players Out:
Cannavaro - 29 starts, 88 minutes per match played
Robben - 25 starts, 4 sub appearances, 78 mpm, 7 goals
Heinze - 24 starts, 90 mpm, 2 goals
Sneijder - 18 starts, 4 sub apps, 64 mpm, 2 goals
Huntelaar - 13 starts, 7 sub apps, 59 mpm, 8 goals
Michel Salgado - 6 starts, 3 sub apps, 76 mpm
Javi Garcia - 3 starts, 12 sub apps, 35 mpm
Saviola - 1 start, 7 sub apps, 26 mpm, 1 goal
Parejo - 5 sub apps, 18 mpm
Bueno - 3 sub apps, 12 mpm

Players In:
Cristiano Ronaldo - AMF/W/F Manchester United
Kaká - AMF/W/F AC Milan
Benzema - F Olympique Lyonnais
Xabi Alonso - MF Liverpool
Raul Albiol - D Valencia
Ezequiel Garay - D Racing de Santander (returning from loan)
Esteban Granero - MF Getafe
Alvaro Arbeloa - D Liverpool

Needless to say, it was a pretty crazy summer in Chamartin. Florentino Perez and his big-spending ways are once again in charge and they have entered the second Galactico era. At the start, I was quite skeptical because in my view Real Madrid last season did not have a problem scoring goals. While Barcelona scored 22 goals more than they did, 83 goals is still a big number and strong enough to compete for the league title. Where they had problems was at the defensive end, where they were only 7th best in goals conceded. I will write more about whether scoring or strong defense is more important, but I think it's clear that they have to be better defensively this season than they were last year in order to have a reasonable chance of sitting on Cibeles's lap next spring.

Defensively, they got a lot better when Lassana Diarra came during the winter break. Looking just at league results, they conceded better than a goal per game less in the 22 matches after his arrival than the 16 matches before and got 10 clean sheets in those 22 games compared to just 3 in the 16 before his arrival. If anything, this understates things because their last three matches weren't important after their loss to Barcelona all-but guaranteed that the league title would be out of reach. They allowed 8 goals in these three matches. They were particularly strong defensively when they played with two defensive-minded central midfielders in "Lass" and Fernando Gago. In the 12 matches that both started, they allowed just 11 goals - 6 of those in a 2-6 loss against Barcelona! Even including that, their goals allowed per game was just 0.92, and they got 7 clean sheets in the 12 matches. When those two didn't start, they allowed 1.58 goals per game, with 2.13 per game conceded if you look only at the matches from the start of the season until Lass made his first appearance.

With that in mind, I thought in the early days of the summer that the new buys would seriously hurt their chances because they would not be able to get all of that galactico power out there without losing a defensive midfielder. Bringing in Xabi Alonso changed all that for me. Lass has come through and now seems popular enough that Pellegrini won't get much pressure to take him off in favor of an attacking player (especially after his game-winning goal in the season opener), and Xabi Alonso is a superstar in his own right and he won't likely be relegated to the bench any time soon. They now have the freedom to play a winning style - something they could not do in the first Galactico era once Perez ran Makelele out of town. Given that and the improvement defensively that they should get from adding Albiol, Arbeloa and even Garay, I expect Real Madrid to be more competitive for the league title than they were a season ago.

Should they win it then you can expect Ronaldo and Kaka to get all the attention, but for my money Xabi Alonso was their most important pickup and it will be the improved defense that is the difference between going to Cibeles and begrudgingly admitting that 'el Barcelona ha sido mejor'.

Here is how I see them lining up:



A big question mark is what they should do with Raul. In my somewhat biased opinion, he does not deserve a spot in the starting lineup. I think they have a couple better options, most notably Higuain. Raul makes the usual controversies that happen at a club full of stars, some of which have to sit, worse because of his status in the dressing room. If you ask me, the sooner they rip that band-aid off and go through the blow up from him not being a regular starter the better off they'll be.

My prediction: I think Real Madrid will mount a better challenge than last season when they made a strong run late but never seemed like digging out of the early deficit. If they start out strong and things don't blow up quickly on them, they definitely have a shot at winning the Primera. Even if things do blow up, I don't expect them to finish outside the top 3.

Sevilla Fútbol Club

2008-2009:
Record: 21-7-10, 70 points (3rd)
Home Record: 11-2-6, 35 points (8th)
Away Record: 10-5-4, 35 points (2nd)
Goals For: 54 (7th)
Goals Against: 39 (2nd)
Poisson Expected Points: 62.7 (4th)
Poisson probability of winning league: 3%

Players Out:
Mosquera - 16 starts, 1 sub appearance, 88 minutes per appearance
Maresca - 14 starts, 7 sub apps, 58 minutes per, 2 goals
David Prieto - 13 starts, 87 minutes per match

Players In:
Negredo - F Almeria via Real Madrid
Sergio Sanchez - D Racing de Santander via Espanyol
Zokora - DMF Tottenham Hotspur

They weren't busy in Nervion this summer. They sold off a few role players and bought Zokora as well as a couple young players who will likely be subs now primarily with the potential to become great players in the future. Given that it's a young team, that's not necessarily a bad thing, but unfortunately it's tough to imagine Sevilla competing with the two giants for the league.

Looking at the results from last year there are two things that strike me as odd. Firstly, Sevilla finished with the same number of points in home and away matches. Without digging up historical records, my feeling is that Sevilla historically have done significantly better at home compared to on the road when compared to the average team. They had a quite long streak of European matches without suffering a defeat. Fans of both teams in the city are famously passionate - in recent times, the national team has played in Seville more often by far than anywhere else. Weather is often a factor as it is significantly hotter there than most of the rest of the country at the beginning and end of the season. None the less, Sevilla were only the 8th best team when playing at home. Oddly enough, Betis, the other team in Sevilla, were the only team in the league that got fewer points at home than away. Sevilla were very good away, finishing second-best overall in that category.

The other thing is that Sevilla did not score a lot of goals in the 2008-2009 campaign. The squad was very similar to the year before. Then they scored 75 goals, some 21 more than last season. They allowed 10 more goals, 49 to just 39 last year, so a possible explanation is a more defensive style. Perhaps this was due to Manolo Jimenez taking charge in full for 2008-2009. He started when Juande Ramos left the club in October of 2007. While he was on for most of the 2007-2008 season, maybe being able to fully set up his system and how he wanted the guys to play during preseason made a difference. From watching them, it's not obvious to me that they played a more defensive style last year than the year before, but the results certainly indicate something to that effect. This sort of thing is a reason that I think it's important to look at objective statistics - watching the matches often doesn't tell the whole story, particularly when you have a rooting interest.

Having said that, I anticipate them scoring more goals this year. They have a lot of young, strong, attacking talent. Capel and Jesus Navas should continue to get better. Romaric looks a lot better physically than he did at this point a year ago and I expect him to have a better year and be able to contribute more at both ends. Kanoute doesn't look like he's slowing down any and Negredo should fill in nicely if and when he is gone to play the African Cup of Nations.

Here's a likely lineup:



Sevilla have a lot of depth and the above is just an example. Other than up top and in goal, every position will see a lot of players rotating trhough.

Prediction: Sevilla will fight for third, bowing out of the title race all too early. It will be pretty similar to last year and I expect them to finish third, potentially solidifying themselves as the third club as Valencia did in the early part of this decade. Having said that, fourth place and the Champions League playoff would be a fine result, and there is a decent chance that they'll fall into a Europa League spot. Anything less would be an unlikely disaster.

Club Atlético de Madrid

2008-2009:
Record: 20-7-11, 67 points (4th)
Home Record: 13-1-5, 40 points (Tied for 3rd)
Away Record: 7-6-6, 27 points (4th)
Goals Scored: 80 (3rd)
Goals Conceded: 57 (Tied for 11th)
Poisson Expected Points: 66.5 (3rd)
Poisson Probability of Winning League; 4%

Players Out:
Leo Franco - GK 32 starts, allowed 48 goals
Seitaridis - 10 starts, 4 sub appearances, 75 minutes per match played
Luis Garcia - 5 starts, 14 sub apps, 30 minutes per match played
Coupet - GK 6 starts, allowed 11 goals

Players In:
Asengjo - GK Valladolid
Juanito - CB Betis
Jurado - AM Mallorca (returning from loan)
Reyes - AM Benfica (returning from loan)
Valera - D Racing (returning from loan)

As you can see, not much action from Atleti this summer. Last season they played an entertaining style of football and we can expect more of that this year. Last year they were an elite attacking team and thoroughly mediocre defensively. There's no major reason to believe that will change this time around, that I can see. There are rumors that Heitinga may be making the last-minute move to Everton, which would weaken them further at the back for this season.

Likely lineup:


Prediction: I think Atleti will have enough defensive problems to keep them well out of the title hunt. Expect them to fight with Sevilla, Valencia, and Villarreal for the two Champions League spots. If you pinned me down and forced me to guess which place they finish in I'd say 5th, putting them in the Europa League.

Villarreal Club de Fútbol

2008-2009:
Record: 18-11-9, 65 points (5th)
Home Record: 12-3-4, 39 points (5th)
Away Record: 6-8-5, 26 points (5th)
Goals Scored: 61 (5th)
Goals Conceded: 54 (Tied for 8th)
Poisson Expected Points: 57.5 (6th)
Poisson Probability of Winning League: 1%

Players Out:
Nihat - 8 starts, 11 sub appearances, 43 minutes per match played
Mati Fernandez - 5 starts, 16 sub appearances, 33 minutes per, 3 goals
Guillermo Franco - 5 starts, 13 sub apps, 38 minutes per match
Cygan - 4 starts, 4 sub apps, 77 mins per match
Altidor - 2 starts, 4 sub apps, 46 minutes per match, 1 goal

Players In:
Nilmar - F Internacional (Brazil)
David Fuster - MF Elche
Marcano - D Racing de Santander
Pereira - F Racing de Santander (returning from loan)

Another team that didn't make any huge moves over the summer, Villarreal should be similar to last year; solid, though unspectacular at both ends of the pitch. I am surprised that Cazorla, Capdevila and Senna didn't get more attention in the summer from the bigger clubs as all three are class players.

Projected Lineup:



As usual, expect a lot of rotating. I'm sure Llorente will get a lot of play up top, for example.

Prediction: I'm going boring. I think Villarreal will finish this season in a Europa League spot. I can't see them finishing second as they did two years ago and it's less likely that they fall out of the European spots. They'll certainly put up a fight for third or fourth but I don't think they are as good as the three obvious rivals and in the end will fall out of the Champions League spots.

Valencia Club de Fútbol

2008-2009:
Record: 18-8-12, 62 points (6th)
Home Record: 12-4-2, 40 points (Tied for 3rd)
Away Record: 6-4-9, 22 points (Tied for 8th)
Goals Scored: 68 (4th)
Goals Conceded: 54 (tied for 8th)
Poisson Expected Points: 61.78 (5th)
Poisson Probability of League Title: 2%

Players Out:
Albiol - 33 starts, 1 sub appearance, 90 minutes per match, 2 goals
Moretti - 25 starts, 3 sub apps, 80 minutes per match
Renan - GK, 19 matches, 25 goals allowed
Edu - 9 starts, 12 sub apps, 44 minutes per match, 1 goal
Vicente - 6 starts, 21 sub apps, 34 minutes per match, 6 goals
Morientes - 7 starts, 13 sub apps, 44 minutes per match, 1 goal
Angulo - 5 starts, 6 sub apps, 51 minutes per match
Carleto - 1 sub apps, played 81 minutes

Players In:
Moya - GK Mallorca
Mathieu - D Toulouse
Dealbert - D Castellon
Bruno - D Almeria
Miku - F Salamanca (returning from loan)
Zigic - F Racing (returning from loan)

If you read Marca about once a week on the wrong day, you'd think that Valencia had a crazy summer in which all of their top players left for Real Madrid or Barcelona. As usual with transfer rumors, things didn't turn out nearly as crazy at it seemed they would. I think Valencia is a bit of a combination of the above teams. Like Sevilla, I expect them to improve in attacking from last year even with the same cast of characters. Mata is an elite winger, and at 21 I'd expect him to have an even better year than he did in 2008-2009 even though he was excellent then. He and the also-elite forwards seem to be well in sync and could be even more dangerous than last year. Pablo Hernandez also seems ready to take over for Joaquin, who has not been able to fulfill the promise he showed while at Betis. Like Atletico de Madrid, I think Valencia are not nearly good enough defensively to mount any serious title challenge. The loss of Albiol shouldn't help matters.

Here's how I see them lining up:



Prediction: I think they will almost certainly be in a four-team fight for the last two Champions League spots, with the two losers likely finishing in the Europa League positions. As I said above, I think there is a good chance that they are even more dangerous going forward next year. If I had to guess, I'd say the Che will finish 4th, behind Sevilla and ahead of Atletico de Madrid.

The Other 11

While I have focused on the 6 top finishers last season, it's certainly possible that one or more of the remaining teams get in there and edge them out for a spot in Europe. Of the others, Deportivo de la Coruña are my pick for most likely team to make a run at the Europa League. None the less, I don't rate their chances very highly. Espanyol is another team that could get up there, but with the tragic death of Dani Jarque it's tough to say how things will turn out.

I perhaps shouldn't add this until I've analyzed how well results from the previous season can be used to predict the results the next season, but just for fun here are a few predictions based mainly on what happened last season:
- Getafe, who finished tied with relegated Betis last season, will turn things around and finish near the middle of the table than the
- Osasuna, in relegation danger themselves finishing just a point clear, will not be as high but will be in a much safer position
- Almeria, who finished 4 points clear of relegation last season, will be relegated or finish even closer to the danger zone
- Sporting Gijon will be playing in the Segunda a year from now

The Promoted 3

To be perfectly honest, I don't much of anything about the three promoted teams. More importantly, I don't have any great ideas on how to rate them based on previous results or signings or something along those lines. That is yet another thing to add to the list of topics to hopefully be covered later. I suspect that it's really hard to predict where teams that get promoted or relegated will finish because the rosters are often completely different. I do not even wish to make a guess at where they'll finish or which is the most likely to stay up.

Friday, August 28, 2009

CONCACAF World Cup Qualification - take 2

If you haven't read the original article on CONCACAF World Cup qualification, I would recommend reading that first as this is a follow-up.

A common issue when using results to create rankings or make predictions is which results to include. Obviously I should include all results from the current qualification campaign. Including all rounds there have been 97 matches involving 35 teams. It's less clear what to do with qualifying matches for the 2006 World Cup, qualifying and the finals for the 2007 and 2009 Gold Cup and friendlies. I will be exploring this more scientifically in the future. For now I'll present the qualification probabilities for three versions - just using qualification results, including all 2007 and 2009 Gold Cup results, and including them but giving them only 50% weight.

The main difference between the predictions using just the qualification and either version including Gold Cup results is Mexico. Compared to the Gold Cups, or previous World Cup qualification campaigns for that matter, Mexico have performed quite poorly thus far in World Cup qualifying. In the two Gold Cups they reached the final, including a 0-5 thrashing of the United States at Giants Stadium. In qualifying, however, they have been mediocre and sit with 3 wins, all at home, and 3 losses, all on the road. The other five teams have had pretty similar results when you compare the Gold Cup and World Cup qualifying. So including the Gold Cup causes Mexico to look a lot better and hence be more likely to qualify.

Here are two tables. The first gives the percentage chance of each team finishing in the top 3 of the hex if you take just the qualification matches, include those of the Gold Cup but with only 50% weight and the last column gives the percentages including them in full. The second table gives the percentages for each team of qualifying overall if you assume that the CONCACAF team in the playoff has a 50/50 shot at advancing against the 5th team from CONMEBOL. I think that's a bit generous as the South American team should be a small favorite over most any of these teams, but it's a decent estimate for the top four teams; El Salvador and T&T would both be significant underdogs should they happen to make the playoff.





As you can see, Mexico is assigned a significantly better chance overall if you include the Gold Cup results into the model. Doing so doesn't affect Costa Rica much, probably because they are three points clear of Mexico. It effectively eliminates El Salvador and Trinidad and Tobago since the most plausible scenario that puts them through is edging out the Mexicans for fourth place in the group. For the US and Honduras, it reduces their chances of a top-3 finish by around 10% and a little under 5% for qualification overall. I think that Honduras is less impacted by the change because they are more likely to finish ahead of the US given that they are even on points and the US plays in Honduras October 10th. Also, because home field-advantage has been so important in qualification, it was not as big in Gold Cup qualifying, or for the US in the finals. Thus, including those results puts less importance on playing at home when predicting what will happen in the last four games. Even though Mexico looks stronger on paper when you include the Gold Cup results, the smaller benefit for the home team makes Honduras's match in Azteca look less daunting.

I don't have a perfect answer for what should be included. It boils down to how indicative you think results are from the Gold Cup of how well a national side will play in future World Cup qualifying matches. The problem is that most teams did not bring their best teams to the Gold Cup, especially in the 2009 version. For example, if you compare the lineups from the 2009 Gold Cup final, played July 26th, and the qualifying match, played August 8th, the US had only one player start both and Mexico six. The US opted to send their best team to the Confederations Cup and a B team to the Gold Cup. Brian Ching (IMUA!), who did not make the Confederations Cup squad due to injury, was the only national-team regular to play in the Gold Cup. The Mexican squad was more mixed, but it was far from their best and they clearly had objectives that went beyond winning the Gold Cup.

The Gold Cup happens every other year and seems to cycle so that the year after the World Cup it is more highly contested. This is likely because the winner there gets sent to the Confederations Cup, which is not true for the edition the year before the World Cup. The 2007 version was no exception and the national sides in it were more competitive. Thus, results from that version should be pretty indicative of qualification results. On the other hand, it took place two years ago, with some qualification matches taking place in 2006. I am not including the results from qualifying for Germany ’06 because I think that teams have changed enough in four years that those results don't say much about today, and it's going to be pretty similar looking back two or three years. I do think it is more reasonable to include the 2007 Gold Cup because the World Cup often serves as a cutoff with coaching changes and players retiring from international football frequently happening shortly after the World Cup ends.

If I had to guess, I'd say that the predictions excluding all Gold Cup results are probably the closest. There is good reason to exclude the results from either Gold Cup and I think even giving each half weight is probably too much. If I were the gambling type, I'd probably use the percentages somewhere between the first and second columns of the above tables, going closer to the No GC numbers than those in the GC-half column. Looking at Mexico in particular, my subjective view is that they have not looked very good in qualifying and I think those results are more indicative of their ability to get results going forward than those from either Gold Cup. Having said that, one of the major points of this blog is to objectively look at results, which is why I have presented the probabilities for the other two versions.

World Cup Qualification - CONCACAF

How It Works

After 3 preliminary rounds, the final round (where we are now) follows a league format with six teams, commonly referred to as the hex. All teams play each other twice, once in each country. As usual, three points are given for a win, one for a draw and none for a loss. The top three teams advance to the World Cup finals in South Africa. The fourth-place team has a playoff with the fifth-place team from CONMEBOL (South America).

Breaking Ties

Should teams be equal on points, the first two tie breakers are goal difference (goals scored minus goals conceded) and goals scored in all matches. If two or more teams have the same points, goal difference and goals scored then they are sorted by points and if necessary goal difference and then goals scored restricting attention only to matches involving those teams.

Where the Group Is Now

The current table, with all teams having played 6 out of 10 matches, looks like this:



Costa Rica are two points clear at the top with 12 points, Honduras and the USA are even with 10, Mexico have 9 and El Salvador and Trinidad and Tobago have some work to do with 5 points each. Each team has played every other team at least once. Curiously, the top pair of teams, the middle two teams and bottom two teams have played each other twice so broadly speaking the schedule favors the teams at the top.

The first thing that jumps out at me is that there is a lot more parity than usual. Only 7 points separate the top team in the group from the bottom and you can't really argue that one or two teams are dominating. For comparison, here is the table this many matches into qualifying for Germany 2006:



(note: I'm using the current tiebreakers, which are different than 2006. In 2006, Costa Rica were ahead of Guatemala because they beat Guatemala in their only meeting to that point)

Four years ago the US and Mexico dominated the hex. At this stage, the US had lost 2-1 to Mexico in Azteca but won their other 5 matches. Including the win over the US, Mexico won 5 out of their first 6 matches and got a draw in the other. Oddly enough, Mexico's only blemish was against Panama, by far the weakest team in the group. They were terrible and went on to lose their last four matches finishing with two draws and eight losses in the final qualifying round. This year there are no teams that have gotten similar results to Mexico, the US or, at the other end of the spectrum, Panama four years ago. Only 7 points separate the top and bottom compared to 14 points last time around at this stage. Panama was effectively eliminated four years ago. While it isn't likely that either goes through (see below), it's certainly possible for El Salvador or Trinidad and Tobago to make it. At the top, the US and Mexico were so dominant in qualification for the 2006 finals that if you kept the 2-0 win over Mexico by the US that would come later and gave both of them a one-goal loss in all of their other remaining matches then both teams still would have gone through! This year even Costa Rica at the top of the table aren't completely safe.

Looking not at the table but the results, another thing that sticks out is that playing at home has been huge. Thus far in the final round, the home team has won 14 matches, drawn 3 and only once has the away team gotten a win. That honor belongs to Costa Rica for their 2-3 win over Trinidad and Tobago. This means that home teams are averaging an astounding 2.5 points per match with away teams eeking out a scant one-third of a point. This again is different from qualifying for the 2006 finals. Then home teams went 20-4-6 (W-D-L) and averaged 2.13 points per match while away teams averaged 0.73 points in their matches. Naturally, scoring follows a similar pattern. A total of 37 goals have been scored by home teams and 14 by the visitors. That averages out to 2.06 goals for the home team and 0.78 for the away team. Four years ago, home teams scored at a rate of 1.9 goals per game and conceded 0.87. This goes hand in hand with the previous paragraph - weaker teams have been much better able this time around to get good results even against stronger teams, especially at home.

Poisson Predictions

I plugged all the results from the current CONCACAF World Cup qualifying campaign into the Poisson model. Using these results it comes up with a scoring and conceding stat for each team as well as a number that represents the benefit of playing at home. These can be used to simulate results for matches that haven't taken place to determine the likelihood of qualification for each team. In the next article, I will discuss adding in results from the 2007 and 2009 Gold Cups. Doing this changes the predictions and I don't want to muddy this article with that discussion.

Based only on 2010 qualifying results, here is a chart giving the percentage chance of advancing for each team.



The first numbered column gives the percentage chance of qualifying by being in the top 3 of the group. For example, it indicates that the US has about an 83% chance of finishing in first, second or third in the group. Finishing in the top three automatically sends a team to South Africa next summer. The next column gives the percentage chance of finishing in fourth place. Finishing there forces the team to win a playoff against the fifth-place team from South America. The last column is the chance that the team finishes in fifth or sixth, completely eliminating them from the competition.

Here is a quick breakdown for each team.

Costa Rica

The ticos have performed well and find themselves in great shape three points clear of Mexico in fourth place. According to the model they are the best attacking team but second-worst defensively in the hex. It would take a fair collapse to fall out of the top three and finishing in the bottom two is extremely unlikely. I give them about a 90% chance to finish in the top three and a 10% chance of finishing fourth putting their chance of qualification in the 94-95% range. The key for their next match, September 5th at home against Mexico, is not to lose. A Mexico win would put the two teams even on points, most likely a point behind both Honduras and the United States. If Costa Rica win then they are essentially in - they would have a 98% chance of a top-three finish according to the model. A draw is fine, but does decrease their chances to 84% of qualifying automatically and about a 15% chance of finishing in fourth. So their overall chances drop, but just a few percent. A home loss to Mexico on the other hand lowers their chances drastically - 64% for the top 3 and 34% for fourth. That means roughly an 80% shot of making it to South Africa, a drop in likelihood of about 15%. Preventing Mexico from gaining on them is pretty important here.

Honduras

Honduras just edges out the US as the second-most-likely team from CONCACAF to reach the World Cup finals. According to the model they are the best team defensively and third best at scoring. They have about an 84% chance of finishing in the top three and a 15% chance of finishing fourth for roughly a 90-91% chance of making it to South Africa. Their next match is against Trinidad and Tobago at home. There aren't many great chances to get three points and this is pretty easily their best shot to do so in their last four matches. Their odds reflect this: if they win then they go through in the top three 88% of the time, while a draw lowers this to 70%. If they somehow lose then their chance of a top-3 finish falls all the way to 57%. The schedule probably adds even more importance to getting a win than the Poisson model suggests. If they just get a draw on September 5th against T&T then four days later they will go into Azteca stadium at most two points clear and possibly in fourth place if Mexico manage a win in San Jose. It's tough enough to play in Mexico City as it is, but going in feeling like you need a result is a very tough task.

United States

The model has the US as the second-best team at both scoring and defending. While the results have not been as strong as usual, the Americans find themselves in great position to go through with about an 83% chance of qualifying automatically and a 15% chance of finishing fourth and playing against a South American team for a spot. Like Honduras, that means they have around a 90 or 91% chance to play in South Africa next summer. The similarities with Honduras don’t end there. The next US match is September 5th against El Salvador in Utah. It is vital that they do not squander their best remaining opportunity for three points. A win against El Salvador puts them at 90% to finish in the top 3 with the other 10% being their chances of finishing fourth - about a 95% chance of qualifying. Getting just a point drops the US to a 71% chance of a top 3 finish and a 27% chance of finishing fourth - about an 84% chance of making it to South Africa for a drop of over 10 percent. A loss would not only be embarrassing but it would lower the Americans' chances to 56% for a top 3 finish and 34% of finishing fourth - good for roughly a 72% chance of reaching the finals. In other words, a win and they are in fantastic shape and anything less leaves things a bit murky. On September 9th they travel to Trinidad and Tobago, a match that is also winnable. With a win against El Salvador, the US should be able to pretty much wrap things up with a win in Port of Spain.

Given the US-Mexico rivalry, a natural question is how much of a blow the 2-1 loss to Mexico was. The answer is not all that much - it cost them about 12% equity for a top 3 finish and a little under 7% overall. In other words, the model currently predicts that the US will qualify automatically 83.3% of the time and finish in fourth 15% for about a 91% chance to qualify. Had the US held on for the 1-1 draw they would be sitting pretty with a 95.7% chance of a top-three finish and a 4% chance of finishing fourth for a qualification percentage of roughly 98%. The difference between the two is 12.4% for automatic qualification and 7% overall. Giving Mexico 3 points pulled them that much closer, making it a fair bit more likely that they finish over the US, forcing the Americans to play against a South American team for qualification. Even so, the US is still in great shape to finish in the top 3 and will most likely finish above Mexico so it wasn’t all that important.

Mexico

Mexico have been pretty disappointing by historical standards and downright mediocre in their qualifying matches. They rate as the fourth best team in both attacking and defending. They narrowly beat Trinidad and Tobago 2-1 at home, lost at El Salvador 2-1 and lost by two goals away against both Honduras (3-1) and the United States (2-0). Despite these lackluster results, they find themselves in decent shape to qualify - although they likely will need to best the fifth-place South American team to do so. The model claims that they have a 41% chance of finishing in the top 3 and a 50% chance of finishing fourth. That puts them somewhere around 65% to make it to South Africa. Getting a result in their match on September 5th at Costa Rica would be quite beneficial. If they get a draw then their chance of a top-3 finish goes up to 47% and their overall chance of making it goes up 5% to 70%. A loss on the other hand drops them to a 29.5% shot at automatic qualification and about 58.5% overall. Should they get a win then they are in great shape with a 78% chance of a top-3 finish and about an 88% chance overall. It is a key match at this stage and given likely wins by the US and Honduras, a loss by Mexico would put them 4 points out of third place. There isn’t a lot of room for them to slip up, and they most likely will have to win out if they lose in San Jose.

The win over the US was a lot more important to Mexico than the loss was for the US. Had they failed to break the 1-1 deadlock, they would have only about a 17% chance of a top 3 finish to go with a 60% chance of finishing fourth. Giving them even odds to win in a playoff that would put them at a 47% chance to make it. With the win against the US they increased their chances to about 41% to go through automatically and 50% of a playoff for a 66% chance of qualifying overall. In short, getting all three points against the US made Mexico about 24% more likely to qualify automatically and 19% more overall. So you could say that while the loss wasn't very bad for the US, they missed out on a great opportunity to seriously damage Mexico's chances by not getting a draw.

El Salvador and Trinidad and Tobago

I'll cover them together because they are in very similar situations. Both are really up against it. It would definitely be a surprise if either made it, but stranger things have happened. Both have about a 1% chance of finishing in the top 3. El Salvador is rated a bit higher with close to a 7% chance of finishing fourth while the islanders are looking at just under 4%. Given that both would be significant underdogs in a playoff against the 5th South American team, I'd say each of these teams qualifies two or three times out of a hundred. El Salvador plays at the United States while Trinidad and Tobago plays at Honduras. If either can somehow get an upset win they would increase their chance of qualification to close to 15%. On the other hand if they get the expected result and lose then their already slim chances would be cut in half.

Conclusion

We essentially have the top 4 teams battling it out to determine which three go through to the finals automatically with the odd-man out forced to play a home-and-home with a South American team. Based on the results of this qualification round, Mexico is the favorite for that fourth spot, getting it about half the time. The next matchday is very important for all four teams, with Mexico traveling to Costa Rica and the US and Honduras playing weak opponents at home. If Mexico can manage a win or the US and/or Honduras fail to win then things will change fairly drastically.

The Poisson Model

I will frequently use what I call the Poisson model, particularly in the near future. The Poisson model takes as inputs the goals scored and goals conceded for each team, as well as the schedules, and spits out an attacking and defending factor for each team and, in the version I will use, a home-field-advantage factor. I can then plug these back into the Poisson formula for any pair of teams and it gives me the probability of each scoreline if they play against each other in a given location. In other words, it provides a way to translate current and/or past results into how likely each possible outcome is in potential future matches.

The Poisson distribution is used in many fields to describe the frequency of various occurrences. This includes things like how many typos in a page, how many atoms of radium will decay in a given period of time, and the number of accidents at an intersection in a month. In order to give my objective predictions and other analyses, I will assume that the number of goals in a match follows this distribution. I further assume that if A plays B at home then the expected number of goals team A scores against B is A’s attacking factor multiplied by B’s defending factor, times the home-field-advantage factor. B’s expected number of goals is B’s attacking factor multiplied by A’s defending factor, without the home-field-advantage bonus. If you aren’t familiar with probability and statistics, you can think of the expected number of goals as what the average would be if they played each other thousands of times in some parallel universe. Using these assumptions and something called the Maximum Likelihood Estimator, I derive estimates for the attacking, defending and home-field-advantage factors. Once those have been estimated, they can be plugged into the Poisson distribution formula to get the probability of each outcome if any two teams were to play each other. I will use these predictions to not only give previews each week for various leagues, but also to analyze how likely teams are to finish in certain spots in their league, avoid relegation, win the title and so forth.

The Poisson model is not perfect. Without getting too technical, it makes a number of simplifying assumptions that most any football fan would not believe to be true. The main one is that the chance of a goal in any period of time, let’s say a minute, does not depend on the score or how much time is left in the game. In other words, it is assuming that a team down 1-0 is as likely to score as they would be if the score was tied 0-0. Because teams change tactics based on the score, it seems that this may not be the case. Also, in extreme situations such as a team being up 5-0, they likely will call off the dogs and be less likely to score than if they were only up 2-0. As a result of this, the model overestimates the likelihood of certain scorelines and underestimates the chances of others. I will be writing at least one article in the future taking an in-depth look at this.

Despite these problems, the Poisson model does a reasonable job of predicting match outcomes. A huge benefit is that it is simple. People have developed more complicated models that are dynamic. In other words, applying them to football, the probability of a goal would depend on how much time is left in the game, what the score is and perhaps when the last goal was scored and by whom. While these would be more accurate, they are quite cumbersome. Essentially the Poisson model makes assumptions that aren’t quite valid, but drastically simplify things. It is easy enough to use that I can quickly apply it to any league or competition where data is available should a reader want me to analyze some competition I haven’t discussed.

I am working on a better model, which would be more accurate and hopefully not much more complicated, but do not have everything worked out. If and when that happens I’ll start using it, but in the meantime the Poisson model is a solid place to start.

Welcome to the Analytical Football Blog!

What is this blog?
The goal of this blog is to present analysis of association football. Ideally it will be mostly objective, analyzing statistics that come from the matches themselves. At times I will give opinions, but will try to clarify that I’m just giving my opinion when that happens.

Why this blog?

Compared to traditional American sports like baseball, basketball and American football, little work has been done to analyze soccer statistics and results. I feel that I can make a contribution in this area.

What should the reader expect?

The first series of articles uses what is called the Poisson model to analyze past results and use them to predict future ones. I will use this to look into how likely each team is to make the World Cup finals. The first two articles, which you can find above, go through CONCACAF (North and Central America and the Caribbean) qualifying. Using the Poisson model as well as taking a look at transfers I will give a preview of the coming season in the Primera Division of Spain. I also hope to do the same for the Premier League, Serie A and Bundesliga before too much of the season has gone. Either way, once the season gets rolling and there are enough results to use, I will apply the model to make predictions about the season and give a weekly preview for each of the major leagues and any smaller league by reader request.

In addition to that, I will be writing articles on topics such as the usefulness of current team stats, (corners, time of possession, fouls etc.) how one can separate a player’s contribution to a team effort, which leagues are the best, whether offense or defense is more important, if some teams are better suited for certain league or cup competitions, and whether the timing of a goal matters. I am working on new individual and team sports and will be posting about those as things develop.

Who are you?

I am currently a graduate student of Economics at the University of Pittsburgh. I am hoping and planning to finish my research and get my Ph.D. next summer. I grew up and currently live in Oregon, on the West Coast of the United States. Since I was a child, soccer has been my favorite sport. I played through high school (grades 9-12) at which point being slow and unskilled became more problematic. While I haven’t done it for 6 years, I also have a lot of experience as a referee. This was primarily for youth soccer, but I also did high school games and local adult leagues when I was an undergrad at the University of Oregon.

Which teams do you support?

As an undergrad, I did a study abroad in Seville, Spain. I was there from January through May of 2000, and saw both Sevilla and Betis get relegated that season. They didn’t win a single game I went to, but I became a fan Sevilla Fútbol Club, something I will be hasta la muerte. Other than that, I am not very biased in favor of any club. I will admit that there are a few big clubs (plus a certain side in the Segunda from Heliopolis) that I dislike, particularly Real Madrid, Manchester United and to some extent Chelsea. Despite my opinion, I know that these are very popular clubs and to the extent that I have readers many will support one of them. I will certainly write about them and will try to be as objective as possible. Few things bother me more than very biased sports announcing and reporting and I hope to not be hypocrite when it comes to that. Fortunately, much of what I will do will rely on objective statistics.

When it comes to national teams, I am one of apparently few avid fans of the sport from the US that is not crazy about the national team. I certainly don’t dislike the US national team, and prefer for them to win. It’s just that I’m not very patriotic and don’t care all that much. I don’t wish to get into them here, but there are definitely things that have cooled my emotions for the USMNT that have a lot to do with ESPN and ABC. I’m sure I will discuss that more later when World Cup qualifying picks up. Despite not being Spanish nor of Spanish ancestry, I have similar positive but tepid feelings about the Spanish national team as a result of having lived there. I suppose I should also say that I support sides such as Mali and Brazil that have Sevillistas playing for them. Going the other way, there are no national sides that I dislike. I can confidently say that when it comes to international football my opinions don’t cloud my judgment.

You talk funny!

Like many Americans, a large portion of what I have read about the game has been from sources in Great Britain or elsewhere in Europe that use British English. I am also a big sports fan in general, so I read and discuss American sports a lot. You can expect me to use a possibly annoying mix of phrases from either side of the Atlantic. I usually refer to the sport as football, though at times calling it soccer for which no apology should be warranted. To avoid confusion, if I am talking about the sport in which the Pittsburgh Steelers are Super Bowl Champions, then I will call it American football, adjective and all. I freely use British terms such as match and pitch as well as their American counterparts like game and field. In my experience this is pretty common for American fans of the beautiful game, but if you are an American that is new to the sport or from elsewhere and you haven’t interacted much with Americans then you may find it odd.

In terms of formatting, I will list things the usual way they are done internationally - the home team will always be listed first. When giving records I will use the win-draw-loss format. Both are pretty standard but again some American readers might be a little thrown off at first so I thought it worth mentioning.

Something that would certainly bother my high school English teachers if they happened across this blog is that when talking about football I tend to take the Commonwealth approach and treat collective nouns as plural instead of singular. Expect me to say “Stoke City are still in the Premier League” instead of “Stoke City is still in the Premier League”. I have no idea why I do this or when it started, but I do it naturally only using the American singular form occasionally. I humbly apologize to anyone bothered by this either way and I hope you can overlook it.