Stats: The Role of Skill and Luck In MLS Shot Metrics
The Role of Shots in Goal Scoring
One of the factors that distinguishes soccer from nearly every sport is the relative rarity of scoring events. This has significant implications in many different contexts, including tactical decisions, player evaluation, tournament organization, and so on. And in particular it has implications for statistical analysis of the game. Because the goal of a team is to win and winning requires outscoring the opponent, team and player analysis must necessarily be rooted in the ability to produce goals. But because goals are relatively rare, we run into significant sample size problems. Say, for example, a team lost two consecutive games by 0 goals to 1. Is that a bad team? Does it have problems finishing? Or were they just unlucky? Does it matter if the team outshot its opponents 20-5 in those games? The answers to those question are at the core of soccer analytics.
One step towards reducing the sample size problem would be to rely on stats other than goals. Shots, for example, happen at a much greater rate than goals, and if you can evaluate players or teams in terms of shots you have a much larger statistical base to look at. But if you look only at shots you are ignoring information about how particular teams or players turns shots into goals. For example, 10 shots from a team that converts a 1/5 of their shots into goals are worth much more than 10 shots from a team that converts only 1/10 of their shots into goals.
You can imagine a team or player's rate of goal scoring as being the product of a number of different rates. First there is the rate at which shots are taken (shots/min), which I'll conveniently call Shot Rate. Then there is the rate at which those shots are on goal (shots on goal / shots), which in the tradition of Chris Anderson (and I assume others) at Soccer By the Numbers and Soccer Analysts I'll call Accuracy. Then there is the rate at which shots on goal go in the net (goals / shots on goal) which in the same tradition we'll call Conversion. A team's goal-scoring rate is the product of these three rates.
So one important step in understanding goal scoring is isolating these rates and examining how teams and players perform on them. In particular, if we can demonstrate that Conversion rates are highly erratic or random, it will bolster the argument for using shots on goal for evaluation. And similarly if we can demonstrate that Accuracy rates are also erratic or random, it will bolster the argument for using shots.
"Randomness" and Skill
Keep in mind that when I say 'randomness' in outcomes, I don't literally mean randomness, as if someone were rolling a die every time a shot was taken. Instead I mean factors that are both extremely unpredictable and largely out of the shooting player's control. For example, when a player takes as hard a shot as possible, extremely small variations in the locations of the laces, moisture at the point of contact, wind, the player's momentum, stability of the plant foot, etc can add up to significant changes in the direction of the ball, which can easily mean the difference between a goal and a shot off the bar. These factors will almost certainly be forever beyond the scope of statistical analysis (and even if they weren't, sample size issues would render them useless) and so instead we group these factors together (along with similar factors affecting a goalkeeper or defender's ability to intervene) and call it Randomness.
One method we can use to determine how much a particular statistic is governed by randomness is to see how it changes over time. Obviously many stats are heavily affected by opportunity and tactics (for example, forwards will naturally take more shots than defenders, regardless of skill), but if we take a look at a player's stats from one half of a season to the next, we should eliminate most of those factors. Player movement from team to team or position to position during a season is relatively rare, and we don't expect a player to improve or deteriorate significantly skillwise during a single season, so the theory is that any changes in a statistic can be attributed in large part to random factors.
Baselines
Let's take a couple of stats from other sports as examples. One statistic that we can be pretty confident is primarily governed by skill is the rate of point scoring in basketball. Obviously the outcome of any single shot in basketball is not much different that the outcome in soccer and can be quite random, but there are so many more shots in a game of basketball that the random factors will average out in the course of a single game. We can be pretty confident that a player that scores around a point every 2 minutes in the first half of the season will continue to score at that rate in the second. To illustrate that, here's a plot that correlates first half point-per-minute rates of the top 50 scorers in the league to second half rates for the same player this season:
You can see that the cluster is pretty close to a nice line pointing up at a 45 degree angle. I've highlighted a couple of outliers. Jason Richardson has had a pretty significant drop off from a ppm rate of 0.552 in the first half to about 0.415 in the second half. But in general the results correlate nicely. The Pearson correlation is 0.78, and we'll use that as a baseline for a stat governed largely by skill.
In contrast, we can turn our attention to baseball and look at Pitcher's BABIP. That's short for Batting Average on Balls In Play, and it refers to the batting average on balls that are hit somewhere in the field of play (not home runs, not fouls, etc). It's pretty well understood in the baseball Sabermetrics crowd these days that a pitcher's BABIP is largely random. Not completely random, as was originally theorized by Voros McCracken in 2001 — groundball pitchers and flyball pitchers have slightly different rates, for example — but it's mostly random. A pitcher with an unusually high BABIP will almost certainly regress to the mean, which is a great tool in evaluating and prediction performance. Here's a plot of pitcher BABIP from one half to the next last season:
That's a pretty striking contrast. This is what the professionals call a 'blob'. Values are clustered roughly around the mean without any line to indicate correlation and the outliers radiate out in a circle. The Pearson correlation is 0.10 (actually negative, but the sign is meaningless in this case).
So this gives us an idea of the range of correlation rates of stats. A soccer stat that's governed mostly by the skill of the player should have a strong intraseason correlation and one governed mostly by randomness will have a weak one. Another way of putting it is this: suppose you're asked to predict the second half stats of a player for whom you have the first half stats. If it's a stat governed by skill, your best bet is to guess whatever the first half stat was. If it's a stat governed by randomness, your best bet is to guess the league average.
Shots
So now let's look at our three rates. For my population I took every MLS player last season who accumulated at least 10 shots on goal in each half to weed out sample size problems. Also keep in mind that I eliminated penalties from the analysis completely. Penalties are obviously their own special beast and would just corrupt any analysis done on the run of play. Here's the plot of the Shot Rate (that's shots / minute) for those players from one half to the next:
That's a good looking correlation. We see a well clustered diagonal line that indicates that first half Shot Rate is correlated with second half Shot Rate. That point way over at the top right of the graph is Edson Buddle, by the way. He took a lot of shots. The Pearson correlation is 0.70. So we can say in general that Shot Rate is not affected significantly by randomness. Players who shoot a lot will continue to shoot a lot.
Accuracy
Now let's look at Accuracy (that's Shots on Goal / Shots). If it's true that there are players who are significantly more or less accurate than others, that should show up as a good correlation here:
But we see the blob again. The Pearson correlation is 0.11, almost as low as pitcher BABIP (!). As an example, in the first half Ryan Johnson had an Accuracy rate of 0.81. In the second half it was 0.47. So either Johnson worsened tremendously as a player in the course of a single season or his Accuracy rate was affected significantly by factors outside his individual skill. In contrast Kheli Dube had a first half Accuracy rate of 0.375. Terrible shooter? His second half accuracy rate was 0.667. What accounts for this? Well, there are certainly sample size issues as we reduce our population to only shots on goal. But I think the sample is large enough, the number of players is large enough, and the resulting correlation is so low that there's really no denying that luck plays a major role in Accuracy. But let me be clear that this doesn't mean that there's no inherent difference in skill in players. Fredy Montero can obviously put a ball on frame from 30 yards out much more often than most other players in the league. But when you take into account all of the factors that constrain that action in a real game: the fact that players will only take a shot that has a decent chance of going in, the fact that players will want to hit the ball with power (and therefore lose accuracy), the fact that defenders will contest nearly every shot, etc, then in the run of play those skill differences get averaged out and accuracy seems to come down largely to factors other than skill.
Conversion
What about Conversion (that's Goals / Shots on Goal)? This will be of particular interest to Seattle Sounders fans who watched their team pepper the goal frame in three consecutive games only to see one goal come out of it (on a back-post tap in of all things). Here's the same plot for Conversion:
Hopefully not a surprise to anyone at this point, but Conversion rates are even less correlated. A player who has a good first-half conversion rate is just as likely to have a terrible second half as a good one. The Pearson correlation is 0.06, which is effectively nothing. Again this isn't a denial of the existence of skill, but an acknowledgement that the skill comes mostly in getting space for a shot, and conditions, defenders, goalkeepers, and so on have a larger impact on whether the shot goes in. If you don't believe that, take a look at the work Anderson has done on Conversion rates in the EPL this season. It's just not true that the good or most talented teams are the ones with the best Conversion rates. Blackburn leads the league just alongside Newcastle. Tottenham, Arsenal, and Chelsea are all in the bottom half.
Conclusion
So what does all of that mean? Obviously a great deal more research has to be done. I suspect that more detailed information about shot quality (location, distance, nearness of defenders, etc) will reveal more significant correlations between skill and results, but in the meantime — if the stats you have to work with are shots, shots on goal, and goals — then in the short run looking just at goals will reveal almost nothing about individual player skill. Instead what we see is that good players get in positions to take dangerous shots and that dangerous shots will, on average, eventually go in.
This is what coaches are talking about when they evaluate their team based on whether they 'created chances'. If you're taking shots (and you're not shooting just for the sake of it, but taking real shots), then a certain proportion of them will eventually go in.
It also means that we're justified in looking at shot data to evaluate players (for example looking net shots when a player is on the field, which we'll look at more later this season). And it means we're justified in saying the Sounders have been tremendously unlucky in the first three games of the season and that more than anything has determined their results.
17 comments
|
5 recs |
Do you like this story?
Comments
Do goals always indicate a shot on goal?
What about own-goals? Doesn’t count as a shot on goal for the attacking team, does it? Should be a minor point.
For a player's stats, yes
My understanding is a team can get a goal without a shot (via an own goal, as you say), but it won’t show up on a player’s stats. Especially since I limit the domain to frequent goal-scorers which will exclude most defenders, who score most own goals. At the end of that filters, own goals don’t add up to much.
Nos Audietis
Individual vs. Team
I’m a little surprised that there wasn’t more correlation for accuracy. I’ve always labored under the belief that teams whom put shots on goal are more likely to have success. Do you know if the pattern is different on the team level rather than on an individual level?
It's pretty much the same on the team level
I’ve done the work for that and should have a post up on it next week.
To clarify, teams that put a lot of shots on goal are definitely more likely to have success, but they do that by having a lot of shots, not by having a distinctly higher percentage of shots be on goal or a distinctly higher percentage of SOGs turn into goals.
Nos Audietis
by sidereal on Mar 30, 2011 4:05 PM PDT up reply actions 1 recs
Conversion I can understand.
Anyone who has watched a beautifully knuckling shot take an abrupt turn at the last moment and duck inside the post (see e.g. Juninho in week 1) understands that there’s an element of luck there. I did not expect to see similar results with respect to accuracy though. I’m looking forward to learning a little more about that. Great piece.
Part of the problem is that Shots on Goal just isn't a very good statistic
A slow dribbler to the keeper is ‘on goal’ but represents very little danger. A wicked bender that misses by a foot when the keeper is off his line is not ‘on goal’ but is substantially more dangerous (in the sense that that same shot kicked 100 times will go in more often than the slow dribbler).
So since many ‘shots on goal’ won’t actually be goal dangerous and since players will be incented to take shots that are goal dangerous but not necessarily ‘on goal’ (like a tap around the keeper far post that may roll wide rather than just kicking it in the keeper’s chest), the correlations to actual scoring will suffer.
Nos Audietis
I wonder if this will require a FootballOutsiders approach
Where people just watch tape / tivo of games and differentiate a “dangerous” SoG from a “non-dangerous” SoG.
Then I’d wonder if there is a skill element to having more dangerous-shots / shots and more goals / dangerous-shots, or if it is still random.
Yes, a tape-based (subjective) classification would help
as would some idea of the location from which the shot was taken. The NHL tracks that data, but it seems that a lot of the data in football’s proprietary. SOMEONE has this, but it’s not public.
The location thing is tough because it dramatically reduces sample size – more buckets, fewer data point in each bucket. But I worry that lumping them together gets you a lot of noise; conversion rate is pretty random, but is that because ‘finishing touch’ is mostly luck, or because the distribution of chances (really good ones/speculative half-chances) is mostly luck? Is a real skill getting obscured by noise in the other component? At some sense, it shouldn’t matter, because bad luck evens out in either case.
I would like to see
The stats for goals scored/shots taken for the top ten scorers in the league. It should show that Wondo and Buddle take the most shots.
by DiehardSoundersFan on Mar 30, 2011 5:58 PM PDT via mobile reply actions
Buddle took
A ridiculous number
"But who would listen to Little old me anyway?"
-by thehemogoblin
by Little old me on Mar 30, 2011 7:23 PM PDT up reply actions
What about the question from the other side of the line?
Nice work, sidereal. I eat this kind of stuff up. I’ve been designing a stat-based soccer board game (think of something like Strat-o-Matic or Pursue the Pennant for the Beautiful Game, if those terms mean anything to you), so analytical research like this is very hepful.
Here’s my question — if the assumption is that there’s little correlation with a striker’s accuracy or conversion rates, do keepers display the same traits? That is, are there some keepers who consisently outperform or underperform in terms of allowing SOGs to score?
by The King of Norway on Mar 31, 2011 5:51 AM PDT reply actions 1 recs
Those names do indeed mean something to me
I still have a copy of the ’95 NBA, MLB, and NHL Stratomatics in my basement. A lot of my early interest in Sabermetric stuff was built on the theory behind designing those cards.
On keepers, I can look into that in the future. My assumption would be that the keeper is not entirely responsible for stopping SOGs (since defenders can also get in the way) but is primarily responsible, so you should see some variation.
Nos Audietis
I may be looking for playtesters in the very near future.
If so, maybe Dave will let me put up a fanshot on the S@H board.
by The King of Norway on Mar 31, 2011 11:33 AM PDT up reply actions
Absolutely
You, as well as anyone, should feel welcomed to post as many fanposts and fanshots as possible.
Editor/writer at Sounder at Heart, North American soccer editor SB Nation and of course follow me on Twitter
by Jeremiah Oshan on Mar 31, 2011 11:44 AM PDT up reply actions
The King of Norway doesn't need to ask
He does need to show me first!
But just him…
I am not a Supporter | I am not a Fan | I am a Sounder
Sounder At Heart
Enjoyed the read
terrific piece!
Writer for SB Nation's Manchester United blog, 'The Busby Babe'
http://twitter.com/#!/Tui11BRoy3
"ROOOONEY!.... It defies description. How about spectacular?...How about superb?"

by 





















