Friday, June 11, 2010

World Cup Preview: Spain part 3

This article will cover the tactics of the current seleccion española. For a more concise summary, which also covers the defensive setup, see the zonalmarking article on Spain.

The Back 4 and Goalkeeper

I'll start with the back five to get them out of the way. There are some questions about the lineup at those positions, but they don't affect the style of play or the formation Spain will use.

Spain are very deep at keeper. Iker Casillas will be the number one, with Pepe Reina at backup and Victor Valdés available if something happens to the other two. All three are widely considered world class. Without looking deeply into it, I'd guess that Valdés would start for 20-25 sides in the competition and he doesn't even figure to play if there is nothing at stake in the third match in the group stage.

At center back, Del Bosque will choose two of Marchena, Puyol, Piqué and Albiol. I don't think there is much in it. I'd probably go with Marchena and Piqué, but it doesn't really matter. Capdevila (pro tip: pronounce the L in Capdevila like an L in English. The last two syllables are VEE-lah) has a solid hold on the left-back spot. On the right, Sergio Ramos is probably going to start over his Real Madrid teammate Álvaro Arbeloa.

The Front 6

The front six is where it gets very interesting. I break the players into 5 groups. Here they are. The groups go from front to back, within each group I've sorted them by what I feel is the most likely to least likely to start.

Strikers - David Villa, Fernando Torres*
True Wingers - Jesús Navas, Juan Manuel Mata
All-Around Attackers - David Silva, Andrés Iniesta, Cesc Fàbregas
Xavi - Xavi
Defensive Midfielders/Deep-Lying Playmakers - Xabi Alonso, Sergio Busquets

* may flip if and when Torres is fully fit.

Navas and Mata should start on the bench in all the matches that matter. I'll discuss them a bit more later because I think they will be important coming off the bench, especially Navas. The only player in the front six that is guaranteed to start all matches with something at stake is Xavi. For the others, it depends heavily on the formation Del Bosque chooses as well as small tweaks he might make.

Formations

A lot of the focus has been on whether to run one striker or two. I don't think that's an important division, for reasons that should become clear later. Instead I will break them down into formations where both Xabi Alonso and Sergio Busquets are starting, and those which feature only one of them. Del Bosque seems inclined to start with both, so I'll look at those first.

4-3-3



This seems very likely to be the formation Del Bosque will use in the opener against Switzerland on the 16th. Here are the roles of each group:

Forwards - Villa plays a pretty standard center-forward position. Of the two on the wings, Silva will be the more flashy but Iniesta's job is more interesting for the tactical enthusiast. Much more than you could say about Euro 2008, Silva's position could legitimately be described as winger, albeit one with A LOT of freedom. He plays on the right side, but is wont to make runs across the pitch, back to the midfield and to goal. Like the other attacking players, he is great at finding holes in the defense and making runs into them. Iniesta provides width on the left. If the ball is in the middle or on the right, you can expect him to be very close to the left touchline, outside of the opposing right back. He moves off that line when Spain get the ball forward on the left, especially when it's played to him. At that point, the attack becomes more fluid. Like Silva, he takes advantage of his knack for finding holes in the defense and incredible vision to find teammates making runs. A recent example of this was the second goal against Poland.

Midfielders - I didn't bother to draw the arrows for Xavi, but they would be in every direction. The same is true for Alonso to a lesser extent. The two of them move all around the middle of the park, exchanging positions often. Xavi is typically the more advanced of the two, but when Spain have the ball Xabi Alonso goes forward to support the attack as well. Busquets will also move up on occasion, but not often. He has a more defensive role, playing deeper and getting fewer touches.

Defenders - The center backs play pretty typically. The only thing worth commenting on is that they will slide out pretty wide when the fullback on their side makes a forward run. Both fullbacks push forward. When the ball is on the other side, they provide midfield width. For the most part, they only make runs into the attacking areas when the ball is on their side. They also rarely make surging forward runs with the ball, like you'd typically see Brazilian fullbacks do.

Here is a short clip showing what happens on the pitch. It's short, but there is a lot to see, so I'd recommend watching it a few times.

Photobucket

It's set to loop, the start is frozen for a few seconds with the names. Just before this, the ball was played by Casillas to Arbeloa, who took a couple dribbles forward before playing it over to Busquets. Busquets took one touch and is about to pass the ball forward to Silva, as you'll see.

The first thing to notice is where Iniesta is. The ball has just moved from the right back to a midfielder and yet he is within a couple yards of the left touchline. In previous formations, he would have been somewhere around a couple yards deeper than Villa's position. Xavi's positioning and movement are pretty typical as well. He starts in a fairly advanced position for him, in position to receive a pass from Arbeloa. Once Arbeloa plays it in to Busquets he waits for Silva to finish his run before moving into the hole created by the defender following Silva out.

Silva's positioning and skill both on and especially off the ball are on full display. He has dropped back and into a more central location at the start, before making a short run even deeper to find some space in the middle. Never stopping his run, he is able to two-touch it right to Alonso's feet. He then turns and makes another run up and to the left, again finding space. His ability to make these sorts of runs in the middle, as here, and further up the pitch are a huge part of Spain's success.

Notice that for Poland all six midfielders and forwards are defending in the area and they are still getting overrun by four Spaniards because of the passing skill and, as importantly, movement. In this case it was Silva and Xavi, but it could easily have been Xabi Alonso, Iniesta or Fàbregas. You can see why they are capable of playing their most defensive formation without having much of a dropoff in scoring. The most impressive thing about this clip is that something like this probably happens 50 times a match.


4-3-3, Villa on the Wing



I think this is the most likely formation that involves both Villa and Torres being on the pitch together. That doesn't make the debate about having just one or both of them moot, but it does change it. There are still some benefits and drawbacks, but instead of discussing whether it's better to have two center forwards or just one it really comes down to Villa vs. Iniesta on the wing. As a sidenote, this will likely be a very important issue for Barça, especially if they don't sell Zlatan.

The other players have similar roles as in the 4-3-3 above. The only one that is different is Capdevila, whom we can expect to be even more aggressive in his forward runs. Look for him to overlap when Villa gets the ball. Without the ball, Villa has the same job as Iniesta - provide width. When it comes to his side, he's going to be more direct, looking for space in scoring areas. He is also capable of setting up teammates.

Obviously Villa is much more of a scoring threat than Iniesta. When the ball is on the right or in the middle, this presents the other team with a Sophie's choice that wouldn't be nearly as tough with Iniesta. If the defense stays compact then Spain can play it across or diagonally to Villa, who may find himself with the ball and only one defender near him - scrambling to get in position. That's pretty devastating. However, if they stay spread out then that leaves a lot of room for Silva, Torres, Sergio Ramos and whichever of Xavi and Xabi Alonso are in position to make a forward run. When the ball is on Villa's side it's still an issue. He often proves too much for the fullback. The defense has to decide whether to have a center back in position to help, which opens up space for Silva, Torres et al in the middle. If they don't then you can expect Villa to have a big night.

To summarize, putting Villa on the wing makes Spain both more direct and more dangerous in the attacking third. With Iniesta they put pressure on waiting to exploit a defensive mistake, with Villa they force the issue. The drawback is that they will control the ball just a bit less - Villa is not as good as Iniesta at moving the ball. They will still dominate possession, after all Silva, Xavi and Xabi Alonso are still on the pitch, and this effect is lessened by Iniesta playing the super-wide role. He's not helping with the short-passing game on the right.

Here is a zoomed-out shot of how it looks on the pitch and a clip that shows how tough it can be to defend.

Photobucket

Photobucket

Sorry for the poor quality. Just before the clip, it's a fairly standard buildup. After knocking the ball around on the right, a center back has played it up to Silva from just ahead of the center line. Silva is cutting in with the ball. Torres fills the space left behind him while Xavi makes a forward run into the middle of the defense. You can see that David Villa has nobody within a dozen yards of him and has both arms out calling for the ball. When he gets it, Capdevila makes a strong overlapping run which pulls the center back out, giving Villa more than enough space for the nice finish. Once Silva made his run the defense was left to pick its poison. Had they been in better position to deal with Villa then Torres, Xavi or Silva himself would have had a great chance to score.

To be honest, I started out with the view that playing only one of Torres and Villa is the way to go but after watching matches and giving it some thought I think this is the best formation if Torres is fit.

Formations with Only One of Busquets and Xabi Alonso

4-2-3-1

This is the quickest one to cover - it's a pretty standard fluid 4-2-3-1. Xavi joins whichever of Xabi Alonso and Busquets in the central midfield. It's basically the same as when they used it in Euro 2008. Iniesta, Fàbregas and Silva move around in the attacking midfield. Other than getting all three of these great players involved, the strength of this formation is that it's unpredictable. Any of them could make that forward run and, of course, the defense has Villa or Torres to worry about up top plus Xavi and the fullbacks coming forward.

4-3-3/4-4-2 Hybrid



This formation is pretty interesting. I call it a hybrid of a 4-4-2 and 4-3-3. The three midfielders, Xavi, Alonso and Silva in the diagram above, play more or less like a normal 4-3-3 midfield while the left winger and forwards, Iniesta, Villa and Torres, play how you would expect them to in a traditional 4-4-2. You could call it a 4-4-2 with a tucked in right midfielder playing deeper than typical, I suppose. In this formation, they don't completely abandon the right side - Silva or Xavi often go out there and the fullback provides support as well. Ramos tends to go further forward, including some forward runs looking for a diagonal ball from the other side of the pitch. However, there is very rarely someone up near the defensive line giving the opposition left fullback something to think about. In contrast, the left winger is consistently very wide much like I described Iniesta in the first formation above.

This asymmetry leads to an interesting dynamic where their attack is very compact on one side, but they spread the defense out on the other.

Here are two shots of it in action against the United States in the Confederation Cup.

Photobucket

In the top shot, you can see their locations when the ball is on the left-hand side. Cesc, listed as the right midfielder, is just to the left of center. The only player on right half of the pitch is Sergio Ramos. For that matter, he is the only player for either team more than a couple yards to that side of center. In the bottom shot, the ball has just been played to Capdevila, I believe from Alonso. The players haven't had a chance to move, so this captures their position when attacking the right side. Riera is not only on the other side of center, he appears to be on the other side of the 18-yard box. Notice how much more spread out the defense is in the bottom shot.

To be honest, I'm not sure how much I like this formation. Since there is an extra man on the left, that is the side they tend to attack. It is also the side where the defense is pretty tight. That's common to all of these formations because Silva is given the free role. However, it's worse in this case because both Torres and Villa are in there taking up space. When there is only one of them, or Villa is on the wing, there is more room to make runs with and without the ball. Also, the results aren't great. This is what they used in the Confederations Cup. The Confederations Cup isn't all that valued, but much more so than a friendly. I think we can take something from it and Spain had some pretty bad results. Obviously there's the loss to the US, but they also beat South Africa 2-0 and Iraq only 1-0. Crushing New Zealand 5-0 was the only match of the four that went as well as expected.

The formations with both Xabi Alonso and Busquets give them more protection from counterattacks and with the 4-2-3-1 they have a more free-flowing attack. Given Torres's fitness concerns and Jesús Navas emerging as a great sub option and even potential starter, I don't think we'll see this formation.

Substitutes

Who the subs are depends a lot on the starting formation. I strongly suspect that Del Bosque will use one of the first two as his basic formation. That leaves one or both of Iniesta and Cesc starting out on the bench. This seems crazy, but it's tough to find an alternative that is as balanced. I think Xabi Alonso will be the key because he is essentially replacing one of those great players. If he can provide the defensive support expected of him and help to push the attack alongside Xavi then it will all work out well. If he struggles with either of those then Spain will struggle and either he or Busquets will likely come off at halftime.

I'm biased since he's my favorite player and plays for my favorite club, but I think Jesús Navas could potentially be the most important sub for any team in the competition. His pace, energy and runs with the ball bring something the starters don't have. Especially against the kind of disciplined, patient sides that give Spain problems his spark coming in off the bench could be just what they need to turn a game around or break the deadlock. The same is largely true of Mata, but Navas seems to be in better form at the moment and the more likely of the two to be brought on.

Tuesday, June 8, 2010

World Cup Preview: Spain part 2

On Spanish Tactics

Firstly, I would be remiss if I didn't mention two fairly recent discoveries. The first is the book by Jonathan Wilson called Inverting the Pyramid. It is best described by its subtitle "The History of Football Tactics". The other is the fantastic blog zonalmarking. Reading them has changed the way I watch the game, for the better. For those familiar, the influence should be pretty obvious in what I'm writing here.

Two Questions

There are two important questions, assuming all relevant players are fit. Should Spain start one or both of Torres and Villa? Should they start one or both of Busquets and Xabi Alonso? I'll address the second question in the next part, coming shortly.

For the first question, the prevailing opinion by those not in charge is that Spain are better with just one. I both agree and disagree with this - I think Spain under Aragonés were much better with just one of Villa and Torres in the lineup but under Del Bosque it doesn't matter as much. I'll discuss this in detail later in this article and in part 3.

What Does the Data Say?

Firstly, lumping together all their matches data give some support to them being better with a lone striker. To test this I did something similar to the analysis of how valuable Cristiano Ronaldo was to Real Madrid. Essentially, it's comparing the results when Spain run one of Torres, Villa, Güiza and Morientes and when they have one of those. As usual, especially with national-team results, the sample sizes involved are very small - don't take this proof of anything, but rather a bit of evidence supporting the claim. In qualifying for South Africa and in the European Championships, both qualification and the finals, Spain have played 28 matches. In 15 of those, two of the strikers started and played at least the first half. In 11, they started with only one of those four strikers. The remaining two matches are perhaps the most interesting - they started with both Villa and Torres but Fàbregas came on as a sub fairly early in the first half. In both of these, their goals all came with only one striker on the pitch so I'm counting them in the one-striker group.

That gives us a sample of 15 matches with two strikers and 13 matches with just one. I'm excluding friendlies for obvious reasons. The Confederations Cup would be more reasonable to put in but I think it's impossible to rank accurately national teams from different confederations in a model as I will do later, so I exclude them. Because their results there were subpar and they played with both Torres and Villa in every match, including the Confederations Cup would make the result even stronger.

In the matches with two strikers, Spain scored an average of 2.4 goals per match and conceded an average of 2 goals every three matches, for an average goal differential of about 1.73. When they played with just one of those four strikers, they averaged 2.08 goals per match and conceded a scant 0.46 per. That gives them an average goal differential of 1.62. At first glance, it comes out about what you might expect - they both scored and conceded fewer goals on average. Their average goal differential was actually higher in matches where they had two strikers and yet I'm saying the data suggest that they are better with 1. The reason for that is that their opponents were much better on average the times they played only one up top. In qualifying for the World Cup, for example, they played with only one against Bosnia-Herzegovina both times and Belgium, Turkey and Armenia once. They played two strikers in both matches against Estonia as well as one of them against Belgium, Turkey and Armenia.

Running them through the averages model, which takes into account where the matches were played (home/neutral/away) and opponent strength, it spit out that they are just slightly better at scoring - roughly 1 more goal every 11 matches - and much better defensively - over half a goal fewer conceded on average per match. I think that defensive number is exaggerated - they did better over that small sample of matches than they would if they played those teams in a million parallel universes. As I said, it's not conclusive, but there is some evidence that Spain are better with just one striker.

Del Bosque's Spain and Aragonés's

As I hinted at above, Spain under Del Bosque is different than Spain under Aragonés. The difference mainly comes down to attacking width. For the rest of this article, I will discuss Spain's tactics from Euro 2008. While I think this is interesting and relevant to the discussion of what Spain should do tactically, it is largely background or information. If you are only interested in a discussion of the formations Spain will use in this World Cup, go ahead and skip this part and go to part 3 (coming soon) which will discuss that.

Aragonés's "4-4-2"

Here is how Spain played in Euro 2008 when both Torres and Villa were on the pitch:



This is different than how they lined up at the start and how most described the lineup. Actually, I don't recall anyone describing it this way, but to me the formation was more like a Brazilian 4-2-2-2 than a 4-4-2. Silva started and was listed on the left, but played on the right side of Iniesta most of the time. Both of them drifted from wing to wing and tended to actually be close together. Neither came close to playing in a traditional winger role that you'd typically see in outside midfielders in a 4-4-2. While Villa and especially Torres made occasional runs out wide, they both stayed pretty narrow for the most part.

I think Jonathan Wilson described the system well in Inverting the Pyramid, though he was talking about 1982 Brazil: "The formation was thus a 4-2-2-2, with a strong central column flanked by two marauding fullbacks in Leandro and Júnior. In a European context, it would have been perceived as lacking width, but this was a team of such fluency and poise in possession that they created it with their movement." (Inverting the Pyramid, page 263)

When someone says something like "Jesus Navas provides width", they typically mean that he plays in a wide attacking role. He stays out there hoping to receive the ball in a dangerous area with the defense out of position because they are more concerned with the other side of the pitch where the main attack is coming from. Because this would be devastating for the opposition, his positioning has the benefit of pulling the defender wider creating more space for the attacking players in the middle. Spain under Aragonés didn't really do that, and not at all in this formation. Despite that, it's a bit misleading to say that they were lacking width.

Spain's mission was to methodically break the other team down using possession and ball movement. They would attack on one side, putting pressure on the defense, poised to take advantage of a mistake. When nothing opened up, they would drop it back and switch the play, moving the attack to the other side or perhaps giving right down the middle a go. Switching the field, even slowly through a series of short passes many of which were backward, forced the defense to rotate. This potentially opened up some holes that Spain could exploit. In other words, they used width in the midfield area instead of using a winger to pull defenders wide or be in a great attacking position if they sagged too far in.

Here are a couple pictures of how it actually looks on the pitch.



In the top one the ball has just been passed up from one of the center backs to Senna, who has just played it over to Xavi. Note how close to the middle Iniesta and Silva are. In this shot, Villa has moved out wide. He did that rarely, though Torres did it on occasion. In the bottom frame, Xavi has just played a free kick to Ramos after a foul about 5 yards back from where he is. You can see that both Iniesta and Silva are on that side and there isn't a single Spaniard on the opposite side within 45 yards of goal.


Aragonés's 4-2-3-1


When Fàbregas came on for Villa, Spain played a pretty standard fluid 4-2-3-1. Cesc, Iniesta and Villa moved around a lot and could be anywhere from either flank to up next to Torres to in a 10 position in the hole to back in the central midfield alongside Xavi, who at times jumped up into the attack as well. Keeping with the width theme, this made them much more spread out in attack. When they attacked down the left, one of the midfielders would be where you would expect the left midfielder to be in a real 4-4-2 - maybe not right out by the touchline but at least as far out as the fullback on that side.

Here is a screenshot shortly after the substitution.



I may have Torres and Silva reversed, it's hard to tell in the video I have. Just before this, Fàbregas made a pass to Sergio Ramos from where Xavi is, Ramos took a couple touches forward and has played it back to Xavi. As you can see, Iniesta is in a much wider position than he was before. He could afford to be because Senna, Xavi and, in this case, Fàbregas dominated the midfield. Silva dropped back shortly after this as well.

The benefit of going with only one striker over the formation above with 2 is that they controled the play even better and had more (some) width in the attacking third. They were also less predictable in attack than with two center forwards. The drawback was obviously that Silva, Iniesta and Fàbregas don't have the positioning or finishing skills of Villa and Torres so the conversion rate was lower on the chances they created.

Under Del Bosque things are different because they have used at least one dedicated wide player - usually on the left wing. That changes the equation, fixing the width issue but creating others. I will discuss the different formations Spain have used in qualifying in part 3.

World Cup Preview: Spain part 1

I'm back. I'll get back to my predictions and some of the old stuff later. I was happy with some of it, but was horribly wrong on other stuff - most notably in failing to predict the collapse of Bordeaux, Leverkusen and Juventus. Right now I want to get some World Cup previews out there while they can still be called previews. As you'll see I'm going to go for depth over breadth. I have a lot to write about Spain, so I'm covering them first.

Form or Recent Results

It's always tough to talk about form with national teams because they play so rarely together, but Spain are coming in off the best results of any side in the competition. The only blemish in the last 2 years is the Confederations Cup semifinal exit at the hands of the United States in a 0-2 upset. In qualification for the World Cup, Spain's group was about average for UEFA with Bosnia and Herzegovina, Turkey, Belgium, Estonia and Armenia. They completely dominated, with a perfect 10-0-0 record and a goal differential of +2.3 per match. Excluding stoppage time, they led for 437 minutes, spent 397 minutes even with their opponents and trailed for a total of just 66 minutes. The core of the current squad goes back to Euro 2008, which they won in impressive fashion.

In the European Championship, both qualifying and the finals, and World Cup qualifying Spain carry a record of 25-1-2 or 24-2-2 depending on whether you prefer to count the win over Italy on penalties as a win or draw. Either way it's quite impressive, as is their +1.67 goal differential per match. Obviously it's a different format and team strengths and so on, but to give you an idea, in the Primera, Serie A, English Premier League, Bundesliga and Ligue 1 the only clubs that had a higher average goal differential this season were Barcelona (+1.95 per match), Chelsea (+1.87) and Real Madrid (+1.76).

Squad

Spain will start every match with the second best midfield in the competition… on their bench. They have an unbelievable pool of ball-moving midfielders that allows them to dominate possession and control the action no matter whom they are up against. Unlike recent selecciones, they have two great wingers, Jesus Navas and Mata, available to come off the bench to change things up. Villa and Torres give them two world-class strikers and they have arguably the best keeper in the world in Casillas. Their defense is the obvious weakness, but that's because it's the only group of players that doesn't jump out as incredibly strong - all defenders in the squad are perfectly adequate. In any case, they don't have to do much defending because of the way they dominate possession. I will discuss specific players and how I see them, or more accurately would like to see them, lining up in the next article, which will be an in-depth discusses of their tactical issues.


The Draw


As I said in the post I made after the draw, in my view Spain were one of the biggest losers in the draw. This may seem shocking because they don't have an especially tough group. Chile did quite well in qualifying but are lacking in World Cup experience, Switzerland are one of the weaker sides from UEFA and Honduras are just happy to be there. This all ignores the most important factor - Spain's goal is not to get out of the group but to win the World Cup. Given that, the group is relatively unimportant. The important matches are in the knockout stage and that's where Spain has it tough.

Assuming they win the group, the Spaniards will face the second best out of Brazil, Côte d'Ivoire and Portugal which are most likely the toughest second-place team to advance. In the quarterfinal, the most likely opponent is Italy. Considering overall quality, Italy should probably be around average for the last eight, but they match up very well with Spain - more on that later. Speculating on the semifinal opponent is ridiculously premature, but eyeballing the groups and bracket for the knockout rounds it looks like Spain would likely face the weakest or second-weakest semifinal team unless there is a big upset somewhere.

Good and Bad Matchups

From the last few years we have a pretty decent idea what it takes to beat Spain. The most obvious place to look is the loss to the United States in the Confederations Cup. The only other blemish in the last 2 years was Italy holding them scoreless for 120 minutes. Thinking outside the box, by far the most similar side to Spain is Barça. There is obviously a lot of overlap in the squads and the two play a pretty similar style. The sides that have had success against Spain and Barça have a lot of similar qualities. They are disciplined, well organized, and comfortable playing very defensive football if that is required. Something that needs to be said is that they also generally had a lot of luck, had a very high conversion rate on their own chances and some amazing play from their goalkeeper and center backs. It's not easy or even likely, but that seems to be the formula. In contrast, teams that are accustomed to open, attacking play that aren't the most solid defensively have tended to get destroyed. Examples include Russia at Euro 2008, Arsenal against Barça in the Champions League and Real Madrid during the 2008-2009 season.

In this competition, let's compare the Netherlands and Italy. These are two potential quarterfinal opponents if the group stage works out the right way. The Dutch are more heavily favored compared to Italy by the betting markets and were much more impressive in qualifying. If the 32 teams played out a league, I think the Dutch would be heavy favorites to finish above Italy. However, Spain would much prefer to play the Netherlands in the knockout round because they match up so much worse against Spain than Italy do.

The Group Stage

Looking at their group, the first opponent will be a good early test. Switzerland should be completely outmatched, but they are just the type of team that match up relatively well against Spain. Expect them to keep a lot of guys back. Honduras the second matchday should present no problems whatsoever. Going into the third match, Spain should have already won the group so we should see mostly substitutes. That match is against the most interesting opponent because Chile play a fairly unique 3-3-1-3 formation.

The next article will discuss the issue of starting two strikers or one, going back to Euro 2008. Here is the link.

Wednesday, February 10, 2010

This Blog Is on Hold

If you've checked in regularly, you have probably noticed the lack of posts. I thought it proper to make an announcement and give an explanation as well as some parting thoughts in case things become more permanent. For now, I am putting on hold all work on the blog for at least the next couple months. The short of it is that I now need to put full effort into writing my dissertation so I can finally finish grad school. After moving back to my home state a year and a half ago, I have come back to school for this semester to get the work done and finish up. Both personally and professionally, the stakes are high and I need to produce.

As far as the blog itself, I'm quite happy with the way it has turned out. I didn't have an overwhelming number of readers, but far more than I expected given that I didn't do much to promote it. I was encouraged that traffic seemed to be growing up until early-to-mid January when I stopped posting frequently. Something else I liked was that I had readers from a variety of countries all around the world. I am confident that there is a market out there for a website with a more analytical approach than what you'd see at soccernet, goal.com or the website of your favorite European sports newspaper. The same goes for developing, or even just putting up, player and team stats. The audience for that will naturally be smaller than the mainstream stuff, but football being the most popular sport in the world I think there are easily enough thinking fans out there to support a full-blown website. I think the same would be true of a general book about the sport or even a detailed season preview each year like you see in other sports like baseball and American football. It's still just a pipe dream, and I'll be busy with other stuff for a while, but I still have some hope that in the next year or two I'll be able to expand from this blog to a full site and maybe work on the publishing stuff as well.

A stereotype of Americans is that we're overly obsessed with stats. While I could be accused of fitting into that, I would like to think I did a pretty good job of using them judiciously. The nice thing about stats is that they show you how useful they are if you put forth the effort to look. For example, I was surprised to discover that there is little relationship between the number of corners for each side in a match and the outcome, however looking at a whole season the number of corners won and conceded by a team are solid indicators of how well they did. Several times people whose opinions about football I highly respect told me that my articles on a league they follow closely were accurate and insightful. For some of these, I wrote the article without having seen any matches from that league, only a couple Champions League or Europa League matches with clubs from there. That is a very strong sign that stats can tell us a lot more than most fans think and that they provide a great deal of insight. I don't want to go too far. I'm aware of the challenges that football presents compared to other sports, especially those that are more popular in the US. I don't think it's possible to become an expert on the game without ever watching or playing it. However, it does seem clear that statistical analysis can improve your understanding of the game, no matter how much of an expert you are or how many matches you watch.

I won't be writing about football for at least the next couple months. For that matter, I won't be doing anything football related other than watching Sevilla (COPA DEL REY FINAL, WOOOT! PALOP QUE GRANDE ERES) when I can. While I lost steam the last couple months, I still have several ideas and topics that I want to write about. For example, I have some ideas related to +/- in other sports for trying to tease out a player's contribution to his team. Before deciding to come here for school I was also planning a several-article series on shooting stats, both player and team. While I am pleased with my work, there is obviously a hell of a lot more that could be done. Hopefully all goes well and I finish my work for grad school, find a job teaching and get back to writing articles here in my free time. As I said above, I have still not given up on the dream of building a proper website and maybe even publishing. I wouldn't say it's likely. It would take a lot of work and good fortune, but it is well within the realm of possibility.

Huge thanks go out to all you readers, especially those who gave me feedback here and elsewhere. I learned a lot about the game, data analysis and myself in the course of writing the 100 or so articles. I am not a great writer and the content of my work is far more technical than most anything written about the sport. Me being American but for some reason using a few British words and grammar rules probably made it tougher to read no matter where you're from or which version of English you learned as a second language. I very much appreciate you putting in the required extra effort. I hope you were able to get something out of this blog and that you'll come back if I start writing again in two months time or further on up the road. Mil gracias.

If you wish to contact me, you can use the email address I set up for this blog. It's analyticalfootball at gmail dot com.


Until we meet again,

Jared

Friday, January 29, 2010

Half-Time Report: La Liga Española

The Spanish league has finally reached the halfway point.

Most Impressive

This is pretty obvious, and it's FC Barcelona. They are in my view the best team in the world and it's not close. While the league as whole seems tougher, particularly their rivals in Madrid appear to be, they keep rolling along. The Blaugrana are 5 points clear at the top of the points table, 9 goals better in goal differential. They are top two in pretty much every statistical category. In the less-important stats they are 5th best in foul differential (times fouled - fouls committed) and 2nd in corner differential. While they are second behind Real Madrid in shot differential, more importantly Barça are best at shot-on-target differential averaging nearly four more shots on target than their opponents in each match. To make matters worse for the rest of the league, they have the highest ratio of goals to shots-on-target (shooting percentage) in the league to go along with having the most shots. They have been doing well defensively and in goal as well with the second lowest shooting percentage against.

Last season at this stage they had 50 points and a GD of 46, so they are one point and 7 goals behind last-year's pace. They did slow down a bit in the second half; if they can keep up their current pace they will break last-year's record of 87 points and a goal differential of +70. About the only bad thing you can say about them is that they won't win the treble this year as they were eliminated from the Copa del Rey by Sevilla, largely thanks to the heroics of Sevilla goalkeeper Andrés Palop.

Biggest Disappointment

Here it has to be Atlético de Madrid. There has been much turmoil in the boardroom at the Calderón and that seems to have worked its way onto the pitch. Atleti last season managed to pip Villarreal and Valencia for the fourth spot and they qualified for the Champions League at the start of the current campaign. Unfortunately, they were a failure in that competition, having busted out fourth in their group, not even qualifying for the Europa League. In the league they have actually been decent over the last month or so and that has pulled them from the relegation zone up to midtable. They are still well out of the fight for a spot in Europe. Their best hope is in the Copa del Rey as they've made the semifinals.

Real Madrid

They don't fit either of the above, but I believe it's illegal to write an article on the Liga without spending time on Real Madrid. Actually, I think there is a lot of interesting stuff to say about them, so they'll get more space than Barça despite not being as good.

I have to give the Madridistas credit as things appear to have gone much more smoothly than I thought they would. They have pretty much sorted out the Raul situation as his role is as a sub or to cover for injured players as he will likely do this weekend. They also have managed to play a reasonably balanced lineup, often featuring both Lass and Xabi Alonso in the midfield. Comparing this team to last year is like night and day, as it probably should be since they shelled out so much coin. Last season at this point they had 38 points and a goal differential of +14. Right now they are at 44 points and a GD of +30. Last season they averaged 2.18 goals per match, so far they are at 2.32 this season. I mentioned in the preview for the season that I thought their issues were on the defensive side of things and they have greatly improved there, going from 1.37 goals against per match last year to just 0.74 in the first half of this season. They have taken 0.8 more shots-on-target per match than last year and allowed their opponents 0.9 fewer. In rankings terms, they have the 5th fewest shots-on-target allowed, a drastic improvement from last year when they were only 13th best. Any season other than this one and the last they would be clear favorites at the top of the table with this level of play.

Something often brought up in the Spanish media is that Real Madrid are dependent on whoever their biggest star is. A few years ago it was Zidane and that has obviously now shifted to Ronaldo. I personally viewed this as fairly ridiculous given the surplus in attacking talent, but this year at least there seems to actually be evidence of this. The season thus far is a pretty good time to look at this because he has missed several matches due to the ankle injury he suffered playing for Portugal. He has played in 12 matches and missed 7 in the league. 7 is a small number but it at least gives something to work with; normally in the first half a season the top players play nearly every match. Ronaldo missed the matches against Atletico de Madrid, Getafe, Racing, Sevilla, Sporting, Valencia and Valladolid. Those are tougher opponents on average than those he played against. While they got about the same number of points per match, 2.3, whether he played or not, the stats indicate that Real Madrid were far better when he was in the lineup. They average half a goal more per match when he plays and concede about two thirds of a goal per match fewer. When Ronaldo played they averaged 1.8 more shots on target per match, allowed their opponents an impressive 2.6 fewer shots-on-target per match. While he's far from a great defensive player this makes sense due to balance. If he makes them far more dangerous in attack both when they are able to build it and on the counterattack then that forces the opponents to be more defensive and you have fewer shots coming at Casillas.

The stats and results-only models both allow me to take into account the difference in opponent strength. Using them it is striking how much better Real Madrid were in the 12 matches featuring CR9 than the 7 matches without him. Using the stats model, Real Madrid with Ronaldo are just behind Barcelona with 88.6 expected points*. Without him they are 5th best with 65 expected points. Going off the results model, Real Madrid with Ronaldo would still be behind Barcelona but it would be close. They have an expected goal differential of 70 according to the model. Without Ronaldo that number drops to 39. Again I'll caution that the sample sizes here aren't large enough to say for sure, but going by that it seems that Ronaldo is worth about 31 goals in goal differential to the Merengues, which corresponds to about 20 league points. It'll be interesting to see who they fare the next two matches as they he has been suspended for breaking Patrick Mtiliga's nose in their match against Malaga last week.

Luck

As I have said in previous articles, luck plays a role in a lot of ways and the two I write a lot about are the relationship between points and goal differential as well as performance in front of goal. The former appears to be essentially all luck, while converting shots into goals, or keeping your opponent from doing the same, are certainly reliant on both skill and luck. I'll write about this more in future.

When it comes to performance in close matches, which is really what getting a lot of points out of goal differential is all about, Deportivo, Tenerife and Athletic de Bilbao seem to be above expectation while Malaga is the lone unlucky standout. Going off the regression formula in my previous article on goal differential and points, the average team with Deportivo's goal differential would have 28 or 29 points, while they have 34. They currently sit fifth, so that is the difference between being in the thick of the race for a European spot and sitting midtable. Tenerife have been bad so far with a goal differential of -20. Going by past years, teams around there at this stage of the season usually have around 13 points so they are about 4 points closer to staying up than they should be. Athletic de Bilbao are similarly about 4 points over expectation. On the other end, Malaga are between 5 and 6 points below where they should be. Historically, teams with a goal differential of -5 at this point have had an average of nearly 23 points, 6 clear of the relegation zone, but they find themselves even on points with Tenerife in 18th. Last season, much as it pains me to say, Betis were very unlucky in this regard and got relegated largely because of it. Their fellow Andalusians could potentially go that same way.

In terms of performance in front of goal, there are a few teams near the top that have struggled at one end or the other. Sevilla, my favorite club, are 4th best according to the stats model but only 6th best in the table. That is largely due to only being 14th best at shooting percentage allowed. Perhaps I'm being optimistic, but I think that's mostly due to luck. Palop has been in good form lately and with him in goal last year Sevilla were 3rd best at shooting percentage against. Villarreal have similarly struggled - they are only 17th best at shooting percentage against compared to 8th last season. I believe largely because of that they are in 9th instead of a few spots up where the stats model puts them. Getafe are currently 7th in the table, just outside of the Europa League. The stats model puts them 5th. It seems their issues have been at the attacking end as they are 15th in shooting percentage. With them I'm far less convinced that it's bad luck though, as they were 19th in the same category last season.

In the relegation battle, the stats model rates Zaragoza 5 spots higher than their current position in 19th. They rate dead last when it comes to shooting percentage against, allowing their opponents to score on a dreadful 44% of their shots-on-target. They were promoted from the second division, so they almost certainly should be near the bottom, but with numbers that bad I'm sure they've been unlucky as well. On the other hand, Racing and Espanyol should be in the relegation zone according to the stats model. This surprised me since they are 7 and 16 goals better in goal differential than the best team in the bottom three. For Espanyol it appears to be a combination of a bit of good luck when it comes to close matches, they are about 2.5 points above expectation there, and they are doing a bit better at stopping shots than one would expect from a team in their position. They are 13th best though so it's not too extreme. Racing is a different story as they are 3rd best in shooting percentage and 9th best in shooting percentage allowed. A big factor there is the play of this year's big revelation Sergio Canales. The talented 18-year-old has scored 5 goals on just 6 shots on target. Otherwise their lineup does not feature guys you would expect to be near the top when it comes to putting away chances, so they'll likely cool down.

Predictions

Before going into new predictions, I want to just go over a few I made for the season just after the season started.

Firstly, I predicted that the top 6 would be the same as last year. While predicting the top 6 is never easy, it looks like I'm off there as Atleti don't look like climbing back in there. I still think there's a good chance that both Sevilla move up and Villarreal do as well so 5 out of 6 are in there. I predicted that Getafe would do much better than last season and that is clearly the case. Last year they ended even on points with Betis who were relegated. Right now they are 7th, 3 points out of the Europa League. Halfway through the season they are only 12 points short of where they were at the end last year. Osasuna I predicted would be in a safer position as they finished just one point clear of relegation. They are currently 6 clear and looking significantly better than several teams below them. Last season Almeria finished 4 clear of relegation, I picked them for the drop or to at least be even closer this year than that. Right now they are only 1 point clear so it's up in the air. Finally, I picked Sporting Gijon to be relegated. They were incredibly fortunate not to be last year despite finishing 14th; they had the worst goal differential in the league at -32. They were particularly bad in defense having conceded 79 goals. This season they have turned things around and are a legit midtable team. Overall I think I did pretty well with the predictions, but there were definitely a couple misses in there.

Here are the new predictions:

Barcelona will win the league. This shouldn't surprise anyone. Real Madrid still have some chance certainly, but Barcelona are better and have a five-point cushion. I don't expect any surprises here.

Real Madrid and Valencia will get the other two automatic Champions League spots. The way I see the league, and the models agree, there is Barcelona, then a fair drop down to Real Madrid, then a similar drop to Valencia. After that there is another gap to the next group of teams like Sevilla and Mallorca. Right now I think Real Madrid are close to a lock to finish in the top 2 and Valencia are almost there when it comes to the third spot. Right now they have a 6 point lead on Sevilla. If they can get a point in the Sanchez-Pizjuan this weekend they'll be in fantastic shape for that important third spot.

Sevilla will finish fourth. I'm a bit wary to make this prediction given that I'm biased and it goes in the direction of that bias. Right now Sevilla are just a point back of Mallorca and Mallorca are also 4 goals better in goal differential. None the less, I think Sevilla have had well more than their fair share of injury problems and are better than the numbers indicate for the first half.

Mallorca and Villarreal edge out Depor and Getafe for the two Europa League spots. Right now Villarreal are sitting 9th, 7 points out the Europa League. I think they'll claw their way up, jumping over Athletic de Bilbao who will drop down to the middle of the table. That leaves a four-club race for two spots which I think will come right down to the last week of the season. I give the edge to Mallorca and Villarreal, but I could see any of those four playing in Europe next year.

Tenerife will stay up, Almeria will drop. To be honest, I don't really believe this. As with the previous half-time reports, I'm trying to pick a team to stay up that it is currently in the relegation zone and a team that is currently clear to drop. I think these are the most likely two sides to flip, but I think Tenerife are more likely to get relegated than Almeria.



*this is what they would average if they played a large number of seasons at the level shown by the stats and results from this year so far. This model appears to overestimate the chances of a result for bad teams against good teams so differences at the top are probably larger in reality.

Monday, January 18, 2010

EPL Halftime Report (long)

I meant to do this last week, but better late than never as they say.

Most Impressive

I suppose it's Chelsea. The blues are three clear in points, one ahead of Manchester United but with a match in hand, and two goals better in differential than Arsenal, who sit second in that category. I'm not sure if others share it but the feeling I have from watching the league is that the top clubs are less dominant this year than in the last few seasons. Just glancing at the table though the opposite appears to be true. Chelsea are on pace for a goal differential of around 60 this year, Arsenal a few less and Man United at around 50. Over the last several seasons the club with the best goal differential has had somewhere between 50 and 60 more goals than their opponents. It looks like the big clubs are cruising along just fine.

Going forward Chelsea's pace will likely slow down due to the knee injury suffered by Michael Essien. Essien is in my view not only the best and most important player at Chelsea, but he is probably the best player in the league. Looking at the season thus far, he has played in 14 of Chelsea's 21 Premier League matches. While the sample sizes of 14 and 7 are very small, comparing the blues' results with and without him in the lineup gives interesting results. When he played, Chelsea outscored their opponents by 1.79 goals per match. When he didn't, they only scored an average of 1.29 more than their opponents. 1.29 would be behind both Arsenal (1.52) and Manchester United (1.36). The most important team stat is shots-on-target differential (SOTD). In the matches where Essien played, Chelsea averaged 11.3 shots-on-target and 4.2 shots-on-target against for a SOTD of 7.1 per match. When he didn't play they slipped to 7.0 shots-on-target and 2.9 for their opponents for an average difference of 4.1. So when he played they were about half a goal and 3 shots on target better than when he didn't. Adjusting for the strength of opponents gave similar results. It'll be interesting to see how things actually shake out.

Most Disappointing

This just has to be Liverpool. Despite selling Xabi Alonso to Real Madrid in the summer, the reds had high hopes for the season. They finished second last year, with the highest goal differential, and their rivals in Manchester sold off the (second?) biggest star in the game. This season little has gone right and their dreams have certainly been tossed and blown. They find themselves out of the Champions League having picked up only one point against the two decent sides in their group, Fiorentina and Olympique Lyon. They lost to both at Anfield, something the commentators likely will ignore next time and every time in the future that they play at home in the Champions League. They are surely a dog to make it into Europe at all this year as Spurs, Man City and Villa all look pretty good and there are only two spots going to those four. It's hard to see Rafa Benitez lasting long; maybe a shakeup is what they need.

Luck

Normally I open the luck section of the report by discussing goal differential and how that is turned into points. Something I have admittedly glossed over with other leagues is injuries. For the English Premier League this season injuries have played a very large role for several clubs. Liverpool were without their three best players for several matches as Gerrard and Mascherano each missed 5 and Torres was out for 7. That may seem pretty bad until you consider Arsenal and Manchester United. Essentially every attacking player for the gunners and defensive player for United that most fans have heard of has been out for a significant number of matches. It is quite impressive that both of those clubs have the depth to get results despite all of that. At the top Chelsea had been in relatively good shape, though Essien had missed some matches with a hamstring injury. Unfortunately with a new and likely long-term injury to such an important player, I don't think we can consider them lucky at this point.

Moving on to goal differential, as I've said repeatedly (and will continue to do so) there is a very strong link between goal differential and points over a season and teams doing well or poorly in points compared to their goal differential seems to be only due to luck. This year there is one result that makes me question that a bit, which is Tottenham's 9-1 win over Wigan. The work I've done on goal differential essentially said that big results will, for the most part, even out over the course of a season. So a team's skill level is best represented by their goal differential but they could do well or poorly in close matches due to luck, and that would mean more or fewer league points. In the case of such a big win, I don't think this holds. Going up to a 4 or even 5 goal margin, I think the extra goals are informative; if two teams have otherwise identical records but A beat four different opponents by 4 goals each and B won those same matches by 5 then I think B is most likely better than A. In going from 6, 7 or 8 goals to 9, I don't think the same thing holds. In other words, I think the usual criticism of goal differential - that it overemphasizes big results - isn't all that great in normal circumstances but with that one extreme result it is valid for those two teams.

Looking at the league, the most unlucky team in this regard are Portsmouth. They are at the bottom of the table but are 15th in goal differential per match. Adjusting for schedule and applying the formula from a previous article on goal differential and points, they are running about 5 points below expectation. Next unluckiest are West Ham who are also running about 5 points low. To make matters worse for these clubs, their relegation rivals appear to be fortunate. Wigan rate as the luckiest. They are running at 7.5 above expectation leaving the huge loss to Spurs alone. Converting it into a 6-1 or even a 5-1 loss makes them still the most lucky with over 5 points more than the average team with their (adjusted) goal differential has had. Wolves, Hull, Blackburn and Burnley are all running between 4 and 5 points above expectation according to the model.

Looking near the top of the table there is nothing too extreme. In a shocking twist, Manchester United appear to have been unlucky despite what you might think due to a certain 6th-minute-of-stoppage-time goal and, well, them being Manchester United. Not to worry though, they are only a point or two below where they should be and there is a lot of season left for Alex Ferguson to turn on his luckbox. Chelsea are two or three points below expectation. Arsenal are the top club that is worst off when it comes to this with between 4 and 5 points fewer than they should have according to their goal differential. Combining those two, if the top three had as close to average luck as possible when it comes to goal differential and points then Chelsea would have 50 or 51 points, Arsenal 49 or 50 points and Manchester United 48 or 49 points. That's pretty similar to where we are now.

Efficiency

Another way I measure luck is to compare each club's spot on the table with where they are according to my stats-based model. Unlike the previous section, there is a major skill component. A big reason for this difference is efficiency in front of goal. I'm actually adjusting the model to account for shooting percentage and shooting percentage against and will probably have the new version ready next week.

The first club that jumps out is again Portsmouth. In my stats model they are actually 10th in the league! My first thought when I saw this is that there must be something seriously wrong with the model. That may be true, as I said I'm adjusting it, but there is good reason to think that Pompey are an average team instead of one of the worst. The main argument is that they are 7th best in both shot and shot-on-target differential. I was quite surprised to see that they have taken more shots than their opponents. They are averaging right around 1 more shot and shot-on-target per match than they allow. Given that they have been outscored by 14 goals in 20 matches, it's clear that something is going wrong in front of one or both goals. It is emphatically both. While it is likely the case that the shots they are taking are less dangerous and maybe they are allowing tremendous opportunities for their opponents, the Portsmouth midfielders shouldn't be too happy with their teammates. They are dead last in scoring percentage, putting in an impressively low 13.7% of their shots-on-target*. That is only about 60% of the league average which is 23.2%. At the other end they are only slightly better at second worst. They have allowed their opponents to score on 28.6% of the shots that make it on target. If shooting was about average for both sides in their matches, they would have a goal differential of +4, some 18 goals better than where they are now. To summarize their season thus far, Portsmouth have been put into such bad financial shape that they are struggling to pay their players, they've been both bad and very unlucky in front of goal at both ends and on top of that they are running really bad when it comes to getting league points for a team with their goal differential while their rivals in the relegation battle have been fortunate. Good times.

Another interesting side is Wigan. They are 14th in the table, 12th if you go by points per match to account for the differences in number played. The model puts them 13th best so they are basically right around where they should be if you go by that. They have been nearly as bad as Portsmouth at both ends of the pitch. They are 18th best at converting shots-on-target into goals and the worst at shooting percentage against. Like Portsmouth, they have been some combination of very bad and really unlucky in front of both goals but unlike them Wigan have been fortunate in close matches. That all washes out and it looks like they are about where they should be. If they want to stay up though, they'll need better shooting and goalkeeping from Kirkland.

Stoke City are on the opposite side of the luck wall. They are the worst side in the Premiership when it comes to shot-on-target differential. It's not very close either. Stoke have allowed their opponent to take an average of 3.7 more shots-on-target than they do in a match compared to 2.9 for second-worst Hull. This is hardly new territory for the Potters. Last season they allowed their opponents 155 more shots-on-target than they got. Next worst was Middlesbrough whose opponents took 82 more shots than they did. That is a pretty ridiculous difference and if I only gave you that info you'd be wise to think that Stoke were relegated last year, probably bottom of the table. Both last year and this season thus far the problem has been at the attacking end. Last season Stoke took just 137 shots-on-target and no other team was below 200. This season they are on pace for 134 and the next worst, Hull, will have 181 if they keep up their current rate.

In 2008-2009 Stoke got by due to very good finishing. They had the best shooting percentage in the league. Given the very low number of shots I wonder if this might somehow be part of their strategy - if they are more patient than the other teams and wait for a good scoring opportunity to shoot then that would lead to this pattern. This season they haven't been quite as good but they are 6th best in the league. They are also fourth best at shooting percentage against. Considering that they don't have the talent of the top clubs, these numbers are better than what one would expect from them. I'm sure they have gotten lucky, especially in scoring the number of goals they have the last season and a half on so few shots, but maybe there is something more going on. I personally hope they stay up for at least another couple years because it'll provide more data to see if it is just variance or something deeper.

At the risk of boring the readers that support bigger clubs, I want to also mention Birmingham City. The blues are strong in 8th position, which is better than most would have given them credit for after being newly promoted to the top flight. They are 8th despite being 15th in shot-on-target differential because of luck and apparently good play by their young goalkeeper Joe Hart. They have allowed their opponents a goal on only 13.3% of shots-on-target, the best in the league. In just over half a season he's put up the kind of shot-stopping performance that should get Capello's attention. Maybe not though since David James has been bad in this regard for Portsmouth and is considered the England #1. He also wasn't great last year as Portsmouth finished with the 16th highest shooting percentage against. There is more to keeping than stopping shots, but David James doesn't exactly have a great reputation in those areas.

Moving to the big boys, nothing is far off from expected. The stats model has the top three in the same order the table does right now with Manchester United just a bit better than Arsenal. Chelsea have been good but not great at both ends. They are 7th best at converting shots in to goals and 6th best at preventing their opponents from doing so. Arsenal have the best shooting percentage in the league scoring on 34.4% of their shots-on-target. It seems that Almunia is the player Arsenal fans complain about the most and perhaps that's justified. They are 10th best at shooting percentage against. That's not terrible, but it's easily the worst of the top 3 clubs. Manchester United are 4th best at shooting and have allowed the 5th lowest shooting percentage.

United fans may be interested to know how their performance at the attacking end compares to last season now that they have 100% less Ronaldo. Firstly, they are on pace for 84 or 85 goals this season. Even if they slow down they will likely pass the 68 they scored last season. I should point out that scoring league wide appears to be up. I'm not sure why that is other than that shooting percentages seem higher so maybe the goalies aren't doing as well. Returning to Man U, they are averaging about a shot and a half more per match this season compared to last but they are behind in shots-on-target by that same margin. They are scoring on a much higher percentage of their shots-on-target: 26.3% this year compared to 18.1% last season. Because there is a decent amount of noise in shooting % (expect an article on this shortly), I would say that the data point to a club that is not quite as strong in attack as last season but not far off at all.

Predictions

It will remain a three-horse race for some time. I'm a bit nervous about this since I was flat out wrong in my prediction that Liverpool would still be around at this stage. Arsenal and Man United are working their way out of the injury problems they've had for much of the season. Chelsea, even without Essien, have looked at least as strong as those two and currently have a cushion. It's tough to see any of the three dropping out of the race. It seems likely that at least two of them will still be fighting the last couple weeks. At this point I think Chelsea remain favorites, though I'm not willing to stick my neck out. Any of them could win it.

Spurs, Man City and Liverpool will fight for the two remaining European spots until the final few weeks of the season. I'm a bit on the fence as far as putting Villa in with this group. I think they're easily the most likely of the four to fall off, though more chaos at Liverpool could prove me wrong there. Right now if I had to pick two I'd take them in the order they are now - Tottenham for the Champions League playoff spot and Man City in the Europa League. Liverpool are far from out of it but they are now four points behind and going the wrong way. I think the reds are a dog to qualify for either competition at this point and are in serious trouble for the Champions League.

Portsmouth will stay up. I'm going to make this my bold pick. While it doesn't seem like a good idea to pick a team that is bottom of the table and in such a bad spot off the pitch, there is a good amount of evidence that suggests that Pompey aren't as bad as their table position. I think at the very least they will climb out of the cellar and fight for another year in the top flight until late in the season. If you disagree with me, and even I might, you aren't alone - Portsmouth are currently the team with the shortest odds in the relegation betting market.

Burnley will be relegated, Bolton will stay up. I'm going to leave the Portsmouth prediction out of it and make Burnley and Bolton my teams to flip. At around the quarter mark of the season, I predicted that Burnley would fall from the middle of the table which has happened. They started out strong with early wins over Manchester United and Everton, but have struggled overall. I like them to drop along with Hull and Wolves. Bolton aren't a great side and I think they will probably be in the fight all season, but they are better than those three.

Finally, I want to follow up on my other prediction form the quarter-time report, which was that West Ham would get out of the relegation zone and be well clear of it by the end of the season. So far so good. I think they'll keep moving up under new ownership. Other than picking Liverpool to stay in the title race, my predictions from then are looking good.


* again I'll say that this might be slightly off due to own goals (likely) not being counted as a shot-on-target. For simplicity of language I ignore this. Due to the rarity of an own goal and that they tend to come on good scoring chances I don't think it changes the implications of the analysis at all.

Thursday, January 14, 2010

Derbies: Can We Really Throw Out the Records?

One of the most common clichés in football is that in a derby the records of the teams don't matter. They are thought to be drastically more unpredictable matches in which anything can happen. Is the cliche accurate? Do results from the rest of the season really matter less in a derby?

Another question about them is how important home-ground advantage is. It seems that it could go either way. On one hand, the atmosphere is much more intense and hostile toward the away team. On the other, the travel is greatly diminished because, by definition, they are matches between clubs that are near each other.

Data and Methodology

Fortunately, these questions can both be answered by looking at actual results. To do so, I am using a relatively simple ordered-logit model that is similar to the one I used to look at results from Boxing Day and beyond. First, for each match, I calculate average goal differential for the home and away team in all their other matches. The main variable is the difference. For example, suppose for a match this is 0.5. This means that in all their other matches, the home team's average goal differential is half a goal better on average than the away side's. In a 20-team league, this would be a difference of 18 or 19 in goal differential. To check home-ground advantage I add what is called a dummy variable. It takes on a value of 0 if the match is not a derby and 1 if the match is. To check for the importance of other match results in a derby I include a variable that takes on a value of 0 if the match is a derby and the difference in average goal differential in other matches if it is. I also added controls for different countries.

The benefit of this approach is that it is able to use aggregate results of other matches. Not only is that necessary to test whether the cliché is true, since it directly references results, but it gives a pretty accurate indicator of the relative strengths of the two teams. Not controlling for this would be even more problematic in a derby since the clubs involved aren't typical. Because they are usually from larger cities, in most of these rivalries one of the clubs is huge, a perennial favorite for the league title. The other tends to be mediocre for a top-flight club.

The data consist of every match for the last 10 seasons (1999-2000 through 2008-2009) in the English Premier League, Spanish Primera Division, Italian Serie A, French Ligue 1 and German Bundesliga 1. To determine which matches were derbies I simply went with those listed at footballderbies.com as either city derbies or local derbies. I left out rivalry matches such as Real Madrid - Barcelona because they do not have the geographical component. I feel pretty comfortable with what they listed for those with which I was familiar. There were 372 such matches.

Home-Ground Advantage in a Derby

I'll start with the advantage of playing at home. Due to the atmosphere and short traveling distance, it's not clear whether playing at home is more beneficial or less so in a derby. This question addresses not just derbies but home-ground advantage in general. It's clear that home sides have a big advantage but it's not clear why since both teams play on the same pitch under the same rules. There are several explanations that all likely play some role: the wear of travel, less familiarity with the pitch and surroundings, a push from the local crowd for the home team and perhaps the away team tightening up for the same reason, the ref being influenced by the crowd etc. In a derby, the familiarity bit is the same as for any other match, reasons based on the crowd are much stronger and those based on travel are much less present.

In the regression, if the coefficient is positive and significantly different from 0 that would indicate that it home-ground advantage is bigger in a derby than a regular match. The opposite would be the case if it is negative and significantly different from 0. If the coefficient is close to zero then that would indicate that there is no evidence that a derby is different from a regular match when it comes to the benefit of playing at home. As it turns out the value was -0.322 with a standard deviation of 0.098 making it very strongly significant. From this it's clear that home-ground advantage is less important in a derby. I think it's safe to conclude that the burden of travel for the away team plays a far more important role in home teams having an edge than the difficulties that come from playing against a hostile crowd.

Because the model is what is called log-linear, there is unfortunately not a direct linear connection between the coefficient and the result. In other words, I couldn't say something like "if it is a derby then the home team is 10% less likely to win". The reason for this is that the change depends on the relative strengths of the teams involved. For now I'll leave out the other question and give a graph of results showing only the difference in the advantage the home side gets in a derby compared to a regular match, leaving out any potential "throw the records out" effect where weaker teams perhaps do better than usual. This graph is for the English Premier League but it would look very similar for other leagues:



The horizontal axis gives the difference in average goal differential for all other matches between the home and away teams. To get an idea of scale, a difference of 2 would be a team near the top of the table playing a team near the bottom. So +2 would be the club near the top at home, -2 would be the one near the bottom playing in a familiar ground. A difference of 1 is about right for a match between a team near the middle of the table and one near the top or bottom. Obviously it varies year to year but that should give you a general idea. As you can see, the effect is largest when the two sides are closest together. The vertical axis is the expected (average) points that the home team would get from the match. The biggest difference in home-ground advantage between a derby and a regular match is when the away team is just a bit better, 4 goals or so on the season. That leads to a difference of 0.22 expected points per match. For most matches it is between 0.15 and 0.22 expected points. In extreme cases where one team is a lot better than the other the difference is about a tenth of a point in expectation.

To put this in perspective, let's compare the edge for the two types of matches when the teams are equally skilled. In this case, according to the model the home team would be expected to win in a regular match 47.6% of the time, the away team 23.5% and there is a 28.9% chance of a draw. This makes the expected points for the home side 1.716 and away 0.995 for a difference of 0.722 points. In a derby with two evenly matched opponents, according to the regression, there is a 39.7% chance of a home win, 29.8% of an away win and 30.5% of a draw. That makes the expected points 1.497 for the home team and 1.198 for the away team and a difference of 0.298. So going from a regular match to a derby reduces the home team's advantage from 0.722 points to 0.298 or by about 59%. That's a pretty big difference and much more than I was expecting.

Can we really throw the records out?

So derbies are different in that the home side has less of an advantage. But can we really throw out other results when two rivals collide?

Not surprisingly, it isn't close to true. The coefficients for the difference in goal differential for all matches and for derbies are not anywhere near enough together in size for one to conclude that the strength of the two teams makes no difference. In fairness, people making this claim (hopefully) aren't being literal but instead are arguing that upsets are more likely in a derby than a regular match. Is that the case?

As it turns out there isn't evidence that this is true either. In the derby matches studied, inferior teams did get better than expected results but they were well within the range that could be chalked up to randomness. For the fellow nerds, the value of the coefficient was -0.062 with a standard deviation of 0.134. Assuming there is nothing special about derbies in terms of bad teams getting better results, there is just over a 64% chance that the outcomes are this extreme due to variance alone. Again, if derbies are as predictable as regular matches then it is more likely than not that the results would be similar to these or that the underdogs would do even better. The standard cutoff varies by discipline but in the social sciences it tends to be a 5% chance to assume that something is statistically different from 0. With a p-value of 64% the data present no evidence whatsoever that a club's record in other matches is any less predictive in derbies.

Let's step back for a second. Statistical significance is great for publishing in an academic journal, but this is a damn football blog. Suppose it is significant or, more accurately, nobody cares if it is or not. How big is the implied benefit to the bad team?

Not much. In the most extreme matches, the difference in goal differential between the best and worst clubs is usually around 90 goals for the leagues with 20 teams. This would correspond to about 2.35 per match. Let's go a bit more extreme to a nice round 2.5. In this extreme case, the model says that the home team in a derby will go from an 87.4% chance of winning down to 85.7%. The chance of a draw goes up from 8.7% in such a regular match to 9.9% in a derby. The horrible away side goes from a 3.9% chance in a regular match to 4.5% in a derby. So the dominant team's chances of winning go down less than 2% and they only about half a percent more likely to lose. This is a difference of only 0.04 expected points. Remember that this is the most extreme situation. Most seasons there aren't two teams with goal differentials that far off and I suspect that in recent years there hasn't been a derby that featured that kind of difference in quality. Despite all that, the worst team only does just a little bit better. So even if nerds like me were satisfied that it passed statistical tests, the difference between a derby and a regular match when it comes to the likelihood of an upset isn't actually significant.

Conclusion

To me, the most interesting thing is the first bit on home-ground advantage. I was surprised that the results indicate that once you account for the quality of the teams it is less important in a derby to be playing at home than a regular match. I thought it would be about the same or even a bit stronger. Perhaps this is projection because the atmosphere is very impressive and hostile and it's hard for those of us who aren't professional footballers to imagine being able to play well in front of 50,000 rabid fans that hate you for the shirt you're wearing. According to the last 10 years of results, it appears that the crowd doesn't matter nearly as much as the burden of traveling. I didn't find the second part very surprising. I think this is simply a case of people remembering big upsets and forgetting the others where the favorite won or just focusing on crazy stuff that happened on the pitch instead of the fact that the better team won.