Friday, August 28, 2009

The Poisson Model

I will frequently use what I call the Poisson model, particularly in the near future. The Poisson model takes as inputs the goals scored and goals conceded for each team, as well as the schedules, and spits out an attacking and defending factor for each team and, in the version I will use, a home-field-advantage factor. I can then plug these back into the Poisson formula for any pair of teams and it gives me the probability of each scoreline if they play against each other in a given location. In other words, it provides a way to translate current and/or past results into how likely each possible outcome is in potential future matches.

The Poisson distribution is used in many fields to describe the frequency of various occurrences. This includes things like how many typos in a page, how many atoms of radium will decay in a given period of time, and the number of accidents at an intersection in a month. In order to give my objective predictions and other analyses, I will assume that the number of goals in a match follows this distribution. I further assume that if A plays B at home then the expected number of goals team A scores against B is A’s attacking factor multiplied by B’s defending factor, times the home-field-advantage factor. B’s expected number of goals is B’s attacking factor multiplied by A’s defending factor, without the home-field-advantage bonus. If you aren’t familiar with probability and statistics, you can think of the expected number of goals as what the average would be if they played each other thousands of times in some parallel universe. Using these assumptions and something called the Maximum Likelihood Estimator, I derive estimates for the attacking, defending and home-field-advantage factors. Once those have been estimated, they can be plugged into the Poisson distribution formula to get the probability of each outcome if any two teams were to play each other. I will use these predictions to not only give previews each week for various leagues, but also to analyze how likely teams are to finish in certain spots in their league, avoid relegation, win the title and so forth.

The Poisson model is not perfect. Without getting too technical, it makes a number of simplifying assumptions that most any football fan would not believe to be true. The main one is that the chance of a goal in any period of time, let’s say a minute, does not depend on the score or how much time is left in the game. In other words, it is assuming that a team down 1-0 is as likely to score as they would be if the score was tied 0-0. Because teams change tactics based on the score, it seems that this may not be the case. Also, in extreme situations such as a team being up 5-0, they likely will call off the dogs and be less likely to score than if they were only up 2-0. As a result of this, the model overestimates the likelihood of certain scorelines and underestimates the chances of others. I will be writing at least one article in the future taking an in-depth look at this.

Despite these problems, the Poisson model does a reasonable job of predicting match outcomes. A huge benefit is that it is simple. People have developed more complicated models that are dynamic. In other words, applying them to football, the probability of a goal would depend on how much time is left in the game, what the score is and perhaps when the last goal was scored and by whom. While these would be more accurate, they are quite cumbersome. Essentially the Poisson model makes assumptions that aren’t quite valid, but drastically simplify things. It is easy enough to use that I can quickly apply it to any league or competition where data is available should a reader want me to analyze some competition I haven’t discussed.

I am working on a better model, which would be more accurate and hopefully not much more complicated, but do not have everything worked out. If and when that happens I’ll start using it, but in the meantime the Poisson model is a solid place to start.

No comments:

Post a Comment