Monday, November 9, 2009

New Rankings and Predictions System

I've been working on a new model which I think helps with some of the problems the Poisson model has. It is based on work done at Smart Football Rankings, an effort to develop a rankings and prediction system for college (American) football. I'll call this the averages model.

The idea is this: instead of making assumptions about the distribution of goals, let's just look at how well each team does at scoring and conceding goals compared to their opponents. There are two stages. In the first, I calculate a scoring and defensive factor by taking the difference between a team's goals for (also goals against but I'll just talk about the scoring half for this) in each match and how many goals on average were conceded by the opponent in their matches against other teams. To account for home-ground advantage, I adjust the average up or down for the match based on whether the team is at home or away. The adjustment is simply the difference between the league average goals and the average goals scored by home teams only. For example, Manchester United had given up 1 goal per match before facing Chelsea. Home teams on average score roughly 0.29 more goals per match than away teams. Chelsea scored one goal so their goals-for score from their match with Manchester United is 1 - (1 + 0.29) = -0.29. For Manchester United, Chelsea had conceded 8 goals in 11 matches for an average of 0.727 goals per match. Since United were playing away and failed to score, they would get 0 - (0.727 - 0.29) = 0.437. For the first step, this would be done for every team in each match. A team's scoring factor is then the average of these for each match played.

The second step adds a level. The first step compares your team's scoring to that of the average opponent of your opponent. If the teams you play have played an easy schedule themselves then your extra goals look less impressive. A way to control for this is instead of using average goals against use the scores calculated from step one along with the average goals for and against for the league as a whole. Getting back to the Chelsea - Manchester United example, Manchester United's defensive score from the first step is -0.689. In other words, they've given up about 2/3 of a goal per match less on average compared to what their opponents have scored against other opponents. The league average for the league as a whole is 1.52 goals per match, and home teams 1.80. Chelsea scored 1 goal against United so they get 1 - (-0.689 + 1.80) = -0.111. Chelsea's defensive score from step 1 is -0.83 and away teams average 1.23 goals per match so Manchester United got 0 - (-0.83 + 1.23) = -0.4. Each team's scoring and defensive factor is the average of of these for all matches played.

There are two reasons I like this system better than the Poisson. The first is that it has gotten better results in some testing. I've done something very similar to the PLM where I use ordered logit on the expected goals for each team according to the model and its predictions have been better. If there is interest I'll post more detailed work there but I used two different scoring systems. One just looks at squared difference between predicted probabilities and what actually happened, assigning 1 if the outcome (home win, away win, draw) happened. The other was a betting system where I looked at how much would be made by the PLM using odds given by the estimates of the averages model and the other way around. The averages model outperformed the PLM in both of these tests. Beyond just these things, I think the rankings make more sense when I look at where it rates teams.

The second reason I like the averages model better is that it I think using sums is more accurate than using products. In the Poisson model, home-ground advantage is given by multiplying the expected goals for home team by a number between 1.1 and 1.5 for most competitions. Similarly, the expected goals in a match for a team is their scoring factor times the opponent's defensive factor. A result of that is that high-scoring teams are more sensitive to the quality of their opponents and playing at home than low-scoring teams. This doesn't seem to be reflected in reality. I'll write more on this later, but for better teams playing at home tends to be a bit less important. The averages model assumes it equally important for all teams so that's an improvement. Note that other than including home-ground advantage, a major difference between mine and the methodology used by Smart Football Rankings is that they actually use products instead of sums.

For at least the next few weeks I'll include both Poisson/PLM and the averages model when giving rankings and predictions. Because I believe the averages model, and its logit, to be superior I'm using it as my main model until I work out a better one.

No comments:

Post a Comment