Stats: Another Expected Goals Model For MLS

Nick Turchiaro-USA TODAY Sports

Today we look at three years of shot data to build a player and team evaluation model

The soccer analytics community has produced a tidal wave of Expected Goals analysis in the last couple of months. Okay, maybe a handful of bloggers and analysts can't quite manage a tidal wave. But still. A wave of some kind.

This post on Cartilage Free Captain is a great summation of the various efforts, though I'm not entirely convinced it will "blow the soccer analytics game wide open". Briefly, an expected goals model is an attempt to determine how many goals a team or player should be expected to score based on the characteristics of the shots they take. In theory you can go even further and get expected goals based on other events (like passing), which would bring Expected Goals fairly close to Runs Created for those familiar with baseball sabermetrics. But for soccer, shots are clearly the right place to start.

An expected goals number has a few uses. First, if a team or player is significantly outscoring their expected number of goals you can take that as evidence that they're uncommonly skilled (in that they convert equivalent shots at a higher rate) or that they've been uncommonly lucky (and can be expected to regress). Second, an accurate expected goals model can be an indication of which is the more dangerous team over the short term (even intra-game), even if no goals have been scored yet.

Michael Caley, the CFC author, has developed a model for both the Premier League and MLS (and he keeps updated advanced MLS statistics here), which relies at the base on assigning shots to various zones (then adds adjustments for key pass type, whether it was a header, etc). Martin Eastwood has also developed a model just using distance which gets about 85% of the way to an accurate expected goals number for the Premier League.

So I thought I'd try to do a similar analysis to Martin's but using MLS shot data and which I expect will come up with similar results as Michael's, but hey let's find out!

Datums

The data set is a collection of 23,902 shots taken in MLS in the last three years, including the location on the pitch from where the shot was taken. Using those locations we can calculate the conversion rate of shots at various distances (in yards), thusly:

Xgoals1_medium

As you can see there's a pretty smooth curve, with a dramatic dropoff from the absurdly high rates at tap-in distances and then at about 7-8 yards it starts to level out. This looks a lot like an exponential distribution, meaning taking the logarithms of the values should yield something like a line (again, this is all consistent with what Martin found in the EPL data). Here's the same chart, but taking the log (base 10) of the conversion rates:

Xgoals2_medium

That's a pretty good line, though there's a little curve at the beginning that looks more systematic than random. And it obviously gets a little ragged at the end. But for our purposes those aren't particular areas of interest. Though the regression treats each data point equally, in fact a vast, vast majority of shots in MLS are taken from 4 to about 30 yards away, and in that area the line is quite well fitted. Even with the outliers at the end the overall R-squared is 0.936, which means if you give me a large sample of shots from a certain place on the field, I can get to 94% of the conversion rate without even knowing anything about the shooter, the assisting pass, the defense, etc.

So if we accept that line as an equation, you get to figuring the conversion rate at a distance with the formula 10^(-.0474d - .3302) where d is the distance in yards. Which looks a little imposing, but computers are doing all the work anyway, so who cares.

So that's a dramatically simple Expected Goals model, and yet. . and yet. If I apply it to three years of Seattle Sounders shooting, the expected number of (non-penalty, non-own) goals I get is 130.8. The actual number of goals is.. 131. So over the course of over a thousand shots, I can get within 0.2 goals of the correct number knowing nothing but how far away they were.

Not all teams fare so well and I expect that stylistic tendencies by some teams (particularly how much they lean on headers) will require adjusting the model to keep close to that level of accuracy. Here's the 2013 season in actual goals (again, excluding penalty and own goals), expected goals based on what we'll call Model v0.1, and the difference.

Team 2013 Goals Expected Difference
New York 53 43.53 +9.47
Real Salt Lake 50 39.44 +10.56
Portland 50 40.16 +9.84
Vancouver 47 36.76 +10.24
Montreal 46 40.15 +5.85
LA Galaxy 46 49.19 -3.19
New England 44 36.18 +7.82
FC Dallas 43 43.70 +0.70
Chicago 42 44.96 -2.96
Philadelphia 41 47.06 -6.06
Sporting KC 40 44.89 -4.89
Colorado 40 43.74 -3.74
Houston 37 48.46 -11.46
Seattle 36 41.09 -5.09
Columbus 35 39.35 -4.35
San Jose 32 46.64 -14.64
Chivas USA 27 28.33 -1.33
Toronto FC 26 31.91 5.91
DC United 19 31.58 -12.58

There are some big outliers here.. well beyond what we could reasonably expect from luck. The teams who fell the most short of their expected goal numbers were San Jose, Houston, and DC United. Notably, two of those teams are much more reliant on headed shots than a typical MLS team. And there was all kinds of evidence that DC was very unlucky as well as being very bad (and you need both to have a historically low number of wins) so it's not surprising to see them there. At the other end, Real Salt Lake and Vancouver scored quite a few more than expected.

One test to see whether a difference is due to skill or luck is to look at a team over time. RSL, for example, was +11 last year. But in 2012 they were +2. And in 2011 -3. So if they've cracked the code of shot conversion, it was a very recent development. Only one team has consistently shown an ability to beat their expected goal total year to year, and I'll discuss them in a future installment.

We can also look at expected goals at the individual level by looking at the players taking the shots. Here are the top 10 players ordered by how much they outperformed their expected goals:

Player 2013 Expected Actual Difference
Marco Di Vaio 11.08 20 +8.92
Camilo Sanvezzo 9.17 17 +7.83
Diego Fagundez 6.64 13 +6.36
Landon Donovan 4.20 10 +5.80
Mike Magee 11.54 17 +5.46
Robbie Keane 5.69 11 +5.31
Dominic Oduro 8.04 13 +4.96
Blas Perez 6.63 11 +4.37
Darlington Nagbe 4.73 9 +4.27
Juan Agudelo 3.92 8 +4.08

Di Vaio is an interesting player to top the list. Of course, he scored a ton of goals. But he also did it with a very distinctive style, hugging the defensive line so tight that he lead the league in being called offside enough to lap the rest of the field. But when the through balls and balls over the top to him were onside, he was totally free of defenders, which would help account for his higher finishing rate. Opta actually tracks breakaways and through balls as well, so it should be theoretically possible to account for those shots.

There's another argument for a stylistic ability to beat the expected goals number. Nagbe sits in 9th place, but Diego Valeri is in 12th (at +3.65) and Will Johnson in 14th (at +3.61). Having 3 players on the top 15 suggests that Caleb Porter's possession-and-throughball style may consistently lead to higher percentage chances.

Now here's the bottom 10 players who significantly missed their expected goal targets:

Player 2013 Expected Actual Difference
Gyasi Zardes 9.22 4 -5.22
Will Bruin 12.86 8 -4.86
Chris Wondolowski 13.87 10 -3.87
Chad Marshall 3.96 1 -2.96
Fabian Castillo 4.92 2 -2.92
Chris Pontius 3.58 1 -2.58
Omar Gonzalez 3.56 1 -2.56
Bobby Boswell 2.49 0 -2.49
Jairo Arrieta 5.41 3 -2.41
Juan Luis Anangono 4.25 2 -2.25

I'll let Galaxy fans expound to you the many ways in which Zardes' shooting was awry last season. Suffice to say he was a disappointment. There's a significant representation of players who primarily shoot with their head, and that doesn't surprise me since we're not scoring them any differently. To expound on the theory, a shot from 5 yards out with your head is probably a run of the mill header off a set piece and doesn't have a particularly high chance of going in, since it's almost certainly heavily contested and headers are hard to control. On the other hand, a shot with your foot from 5 yards out is a gift and they tend to go in at a high rate. I expect that explains Omar's, Chad's, and (to some extent) Bruin's presence on the list and in the next iteration I'll break out headed conversion rates separately.

As I said, more to come. But for now I think it's a pretty impressive and easy to calculate model for only needing a single piece of data for each shot.

X
Log In Sign Up

forgot?
Log In Sign Up

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

Your username will be used to login to SB Nation going forward.

I already have a Vox Media account!

Verify Vox Media account

Please login to your Vox Media account. This account will be linked to your previously existing Eater account.

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

Your username will be used to login to SB Nation going forward.

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Sounder At Heart

You must be a member of Sounder At Heart to participate.

We have our own Community Guidelines at Sounder At Heart. You should read them.

Join Sounder At Heart

You must be a member of Sounder At Heart to participate.

We have our own Community Guidelines at Sounder At Heart. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9353_tracker