Stats: MLS Offensive Rates at Different Scorelines
Last week local soccer stats blogger Zach Slaton wrote an article on some work he did that showed that single-game shot differential in the EPL is actually negatively correlated with match outcomes. Given the work we've done here related to evaluating teams based on shot differential, this is obviously of some interest. Of course data from one league can't necessarily be applied to another, and this especially applies to MLS, which exhibits far more parity than any other major domestic league. But I wanted to take his findings and do some investigation on MLS numbers.
The most sensible theory for why shots might be negatively correlated with points is not that they're causative (i.e. that shots are bad to take) but that teams that are behind (and thus most likely to lose points) take a lot of shots to try to get back into the match. This is (as a commenter on Zach's site pointed out and as immediately occurred to me as well) much like running stats in the NFL. It's pretty well known that winning teams in the NFL rush a lot and accumulate a lot of rushing yards. The traditional interpretation of this is that having a good running game is more important than a good passing game, but in fact it's mostly just a result of the fact that teams rush a lot more when they have a lead (to bleed the clock down) and so teams that are winning will accumulate rushing yards.
So one way of examining this would be to check out how offensive rate stats change as the scoreline changes. If our theory of the mechanism at work is correct, then we should see accuracy rates decrease as teams get further behind (and therefore in theory take worse shots). If this is the case we can refine our shot-based analytics to take into account score when crediting players for shots. For example, you could give less credit to a player being on the field when his team generates a shot when they're behind, since those shots are theoretically of lower quality.
So here are our standard offensive rate stats from the 2010 MLS season split by the scoreline. Note that I clustered a lead of 3 goals or more into a single category, since there were only 111 minutes last season in which a team had a 4 goal lead or more, and the sample size at that point would be too small (Interesting fact: Seattle was the only team to have both a 4 goal lead and 4 goal deficit at some point last season. Thanks LA Galaxy and Columbus Crew).
Stats Glossary
- Shot Rate - Shots / Minutes
- Accuracy - Shots on Goal / Shots
- Conversion - Goals / Shots on Goal
- Strike Rate - Goals / Shots (or Accuracy * Conversion)
| Score Diff | Minutes | Shots | SOG | Goals | ShotRate | Accuracy | Conversion | StrikeRate |
|---|---|---|---|---|---|---|---|---|
| +3+ | 790 | 66 | 34 | 8 | 0.084 | 0.515 | 0.235 | 0.121 |
| +2 | 2434 | 278 | 126 | 36 | 0.114 | 0.453 | 0.286 | 0.129 |
| +1 | 8250 | 967 | 407 | 96 | 0.117 | 0.421 | 0.236 | 0.099 |
| Even | 21814 | 2281 | 1050 | 286 | 0.105 | 0.460 | 0.272 | 0.125 |
| -1 | 8250 | 731 | 353 | 118 | 0.089 | 0.483 | 0.334 | 0.161 |
| -2 | 2434 | 235 | 107 | 38 | 0.097 | 0.455 | 0.355 | 0.162 |
| -3+ | 790 | 76 | 38 | 8 | 0.096 | 0.500 | 0.211 | 0.105 |
So if our previous theory about the mechanism at work is true, then we should see teams taking fewer shots once they have a lead. But that's not the case. The scoreline with the highest shot rate is +1. Of course there are two conflicting factors at work regarding the shot rate. One the one hand a winning team will bunker, reducing shots. But on the other hand a better team will just generate more shots. It seems in this MLS data that the second factor is winning out and winning teams are continuing to outshoot opponents.
What about the quality of the shots? Things get more interesting there. The lowest Strike Rate by far occurs at a +1 scoreline, meaning the quality of shots by teams at +1 are substantially lower than any other scoreline. Shots by teams at -1 and -2 are almost twice as likely to result in a score as a shot by a team at +1. More work would need to be done on exploring the mechanism behind this, but it's an interesting result for those who worry about the number of ties in soccer. Not only are there a lot of ties because the number of scores is, in general, low, but the number of ties is increased even beyond that because it looks much more likely that a team at -1 will score to tie the game than a team at +1 will score to extend the lead. As you can see in the data, last season teams a goal down scored 118 times in just over 8000 minutes of play, while teams a goal ahead only scored 96 in the same time, despite significantly outshooting the losing teams. There's an argument that teams really do lose their cutting edge when they have a cushion.
But while that result is interesting it also contradicts Zach's results to some degree. His EPL results were that Shots were negatively correlated with points but that Shots on Goal were positively correlated. That would imply that shots get less accurate (fewer shots on goal to shots) as teams get behind, but we show the opposite result.
One possibility is, as I wrote previously, that EPL and MLS data just don't share the same tendencies given the different quality of the teams. But I think the more likely possibility is a methodological problem that someone with a few more semesters of Statistics than I have under the belt would have to explore.
5 comments
|
2 recs |
Do you like this story?
Comments
Love it
Lots of puzzles to be explored. Great job raising these issues. So what the data show is that the odds of being scored on increase when you’re up by a goal, and the odds of scoring increase when you’re down a goal. But these dynamics change when a team is up two or more?
what does it look like broken down by team?
I’d like to see what the chart looks like when broken down by individual team. Or if there aren’t enough minutes for a single team’s data I’d like to see the top/middle/bottom three teams by season goal differential bucketed together. Since more of the minutes in the top half of the table will be played by better teams and better teams will probably have higher strike rates we’re probably underestimating the drop in strike rate that’s correlated with being ahead.
Home and Away
I’m also interested in which team scores first – home or away. I wonder how the home team going behind would change the likelihood of pushing for the equalizer. Or vice versa. I’d also just be interested in the overall timing of the goals (for the home and away teams) throughout the league over a long period of time. Home team score first? Mostly second half goals? My coach always said ten around the start and end of the halves is when the goals happen….
by DW Sounder on Apr 27, 2011 11:24 PM PDT via mobile reply actions
as an utterly tangential aside...
i have not once been the least bit worried about the number of ties in soccer…
...that's MISTER Keller to you!!!
Testing for significance
The problem with statistics is it can be a tricky thing to draw conclusions unless you are careful. Much of the discussion has centered on whether one factor or another hasn’t been considered in the analysis. Another issue is the variance of whatever parameter is considered and whether the variance is so large that the noise swamps the signal. It would be good if the parameters listed included a standard deviation to look at this issue. Ultimately you could take all the parameters people have suggested (team ranking, home and away, when the goals were scored, etc.) and do a multiple regression on each of the stats listed including a test for the significance of the determined coefficients. The significance tests would take into account issues of small sample size.

by 
















