Thursday, January 14, 2010

Derbies: Can We Really Throw Out the Records?

One of the most common clichés in football is that in a derby the records of the teams don't matter. They are thought to be drastically more unpredictable matches in which anything can happen. Is the cliche accurate? Do results from the rest of the season really matter less in a derby?

Another question about them is how important home-ground advantage is. It seems that it could go either way. On one hand, the atmosphere is much more intense and hostile toward the away team. On the other, the travel is greatly diminished because, by definition, they are matches between clubs that are near each other.

Data and Methodology

Fortunately, these questions can both be answered by looking at actual results. To do so, I am using a relatively simple ordered-logit model that is similar to the one I used to look at results from Boxing Day and beyond. First, for each match, I calculate average goal differential for the home and away team in all their other matches. The main variable is the difference. For example, suppose for a match this is 0.5. This means that in all their other matches, the home team's average goal differential is half a goal better on average than the away side's. In a 20-team league, this would be a difference of 18 or 19 in goal differential. To check home-ground advantage I add what is called a dummy variable. It takes on a value of 0 if the match is not a derby and 1 if the match is. To check for the importance of other match results in a derby I include a variable that takes on a value of 0 if the match is a derby and the difference in average goal differential in other matches if it is. I also added controls for different countries.

The benefit of this approach is that it is able to use aggregate results of other matches. Not only is that necessary to test whether the cliché is true, since it directly references results, but it gives a pretty accurate indicator of the relative strengths of the two teams. Not controlling for this would be even more problematic in a derby since the clubs involved aren't typical. Because they are usually from larger cities, in most of these rivalries one of the clubs is huge, a perennial favorite for the league title. The other tends to be mediocre for a top-flight club.

The data consist of every match for the last 10 seasons (1999-2000 through 2008-2009) in the English Premier League, Spanish Primera Division, Italian Serie A, French Ligue 1 and German Bundesliga 1. To determine which matches were derbies I simply went with those listed at as either city derbies or local derbies. I left out rivalry matches such as Real Madrid - Barcelona because they do not have the geographical component. I feel pretty comfortable with what they listed for those with which I was familiar. There were 372 such matches.

Home-Ground Advantage in a Derby

I'll start with the advantage of playing at home. Due to the atmosphere and short traveling distance, it's not clear whether playing at home is more beneficial or less so in a derby. This question addresses not just derbies but home-ground advantage in general. It's clear that home sides have a big advantage but it's not clear why since both teams play on the same pitch under the same rules. There are several explanations that all likely play some role: the wear of travel, less familiarity with the pitch and surroundings, a push from the local crowd for the home team and perhaps the away team tightening up for the same reason, the ref being influenced by the crowd etc. In a derby, the familiarity bit is the same as for any other match, reasons based on the crowd are much stronger and those based on travel are much less present.

In the regression, if the coefficient is positive and significantly different from 0 that would indicate that it home-ground advantage is bigger in a derby than a regular match. The opposite would be the case if it is negative and significantly different from 0. If the coefficient is close to zero then that would indicate that there is no evidence that a derby is different from a regular match when it comes to the benefit of playing at home. As it turns out the value was -0.322 with a standard deviation of 0.098 making it very strongly significant. From this it's clear that home-ground advantage is less important in a derby. I think it's safe to conclude that the burden of travel for the away team plays a far more important role in home teams having an edge than the difficulties that come from playing against a hostile crowd.

Because the model is what is called log-linear, there is unfortunately not a direct linear connection between the coefficient and the result. In other words, I couldn't say something like "if it is a derby then the home team is 10% less likely to win". The reason for this is that the change depends on the relative strengths of the teams involved. For now I'll leave out the other question and give a graph of results showing only the difference in the advantage the home side gets in a derby compared to a regular match, leaving out any potential "throw the records out" effect where weaker teams perhaps do better than usual. This graph is for the English Premier League but it would look very similar for other leagues:

The horizontal axis gives the difference in average goal differential for all other matches between the home and away teams. To get an idea of scale, a difference of 2 would be a team near the top of the table playing a team near the bottom. So +2 would be the club near the top at home, -2 would be the one near the bottom playing in a familiar ground. A difference of 1 is about right for a match between a team near the middle of the table and one near the top or bottom. Obviously it varies year to year but that should give you a general idea. As you can see, the effect is largest when the two sides are closest together. The vertical axis is the expected (average) points that the home team would get from the match. The biggest difference in home-ground advantage between a derby and a regular match is when the away team is just a bit better, 4 goals or so on the season. That leads to a difference of 0.22 expected points per match. For most matches it is between 0.15 and 0.22 expected points. In extreme cases where one team is a lot better than the other the difference is about a tenth of a point in expectation.

To put this in perspective, let's compare the edge for the two types of matches when the teams are equally skilled. In this case, according to the model the home team would be expected to win in a regular match 47.6% of the time, the away team 23.5% and there is a 28.9% chance of a draw. This makes the expected points for the home side 1.716 and away 0.995 for a difference of 0.722 points. In a derby with two evenly matched opponents, according to the regression, there is a 39.7% chance of a home win, 29.8% of an away win and 30.5% of a draw. That makes the expected points 1.497 for the home team and 1.198 for the away team and a difference of 0.298. So going from a regular match to a derby reduces the home team's advantage from 0.722 points to 0.298 or by about 59%. That's a pretty big difference and much more than I was expecting.

Can we really throw the records out?

So derbies are different in that the home side has less of an advantage. But can we really throw out other results when two rivals collide?

Not surprisingly, it isn't close to true. The coefficients for the difference in goal differential for all matches and for derbies are not anywhere near enough together in size for one to conclude that the strength of the two teams makes no difference. In fairness, people making this claim (hopefully) aren't being literal but instead are arguing that upsets are more likely in a derby than a regular match. Is that the case?

As it turns out there isn't evidence that this is true either. In the derby matches studied, inferior teams did get better than expected results but they were well within the range that could be chalked up to randomness. For the fellow nerds, the value of the coefficient was -0.062 with a standard deviation of 0.134. Assuming there is nothing special about derbies in terms of bad teams getting better results, there is just over a 64% chance that the outcomes are this extreme due to variance alone. Again, if derbies are as predictable as regular matches then it is more likely than not that the results would be similar to these or that the underdogs would do even better. The standard cutoff varies by discipline but in the social sciences it tends to be a 5% chance to assume that something is statistically different from 0. With a p-value of 64% the data present no evidence whatsoever that a club's record in other matches is any less predictive in derbies.

Let's step back for a second. Statistical significance is great for publishing in an academic journal, but this is a damn football blog. Suppose it is significant or, more accurately, nobody cares if it is or not. How big is the implied benefit to the bad team?

Not much. In the most extreme matches, the difference in goal differential between the best and worst clubs is usually around 90 goals for the leagues with 20 teams. This would correspond to about 2.35 per match. Let's go a bit more extreme to a nice round 2.5. In this extreme case, the model says that the home team in a derby will go from an 87.4% chance of winning down to 85.7%. The chance of a draw goes up from 8.7% in such a regular match to 9.9% in a derby. The horrible away side goes from a 3.9% chance in a regular match to 4.5% in a derby. So the dominant team's chances of winning go down less than 2% and they only about half a percent more likely to lose. This is a difference of only 0.04 expected points. Remember that this is the most extreme situation. Most seasons there aren't two teams with goal differentials that far off and I suspect that in recent years there hasn't been a derby that featured that kind of difference in quality. Despite all that, the worst team only does just a little bit better. So even if nerds like me were satisfied that it passed statistical tests, the difference between a derby and a regular match when it comes to the likelihood of an upset isn't actually significant.


To me, the most interesting thing is the first bit on home-ground advantage. I was surprised that the results indicate that once you account for the quality of the teams it is less important in a derby to be playing at home than a regular match. I thought it would be about the same or even a bit stronger. Perhaps this is projection because the atmosphere is very impressive and hostile and it's hard for those of us who aren't professional footballers to imagine being able to play well in front of 50,000 rabid fans that hate you for the shirt you're wearing. According to the last 10 years of results, it appears that the crowd doesn't matter nearly as much as the burden of traveling. I didn't find the second part very surprising. I think this is simply a case of people remembering big upsets and forgetting the others where the favorite won or just focusing on crazy stuff that happened on the pitch instead of the fact that the better team won.

1 comment:

  1. This comment has been removed by a blog administrator.