As our own malcontentjake and countless others have demonstrated repeatedly, the numerical shorthand for soccer formations which we're all accustomed to (e.g. 4-4-2) is almost completely useless. Whether you have a 'band of 4' defenders or a 'band of 5' is much less important than what those defenders are doing at any given time. And the inherent assumption of symmetry doesn't map well to a world in which one fullback might be more attacking than another, one winger might cut to the middle more than the other, or one forward might play in front of — rather than next to — another.
Fundamentally, a formation is just a shorthand description of where the players tend to be on the field. So what better way to determine the formation than to simply keep track of where the players are? Thanks to the detailed statistics that Opta is collecting for MLS matches, we can do just that. Though the data doesn't include every player's location at every moment, it's granular enough to include the location of every pass, tackle, shot, interception, and other on-ball event. While there is a lot to playing a position that doesn't include any of these events, I think it's fair to say that if a player spends any significant time in an area of the field, he will tend to engage in those activities there as well.
So if we take every tracked event that a player was involved in and map it onto the field, we can see where that player played. This is exactly how the heat maps on the league site are generated. And if we plot the mean location of those events and call that location the player's 'formational position' we can see where the entire team is playing, on average. Here is an example from the final Seattle Sounders regular season game last season — versus Chivas USA:
Here each individual event is marked with a small dot and the mean location for each player is marked with a larger dot. And what we see is a pretty good match to what you'd intuitively expect the Seattle formation to be. The fullbacks are pushed ahead of the central defenders, the wingers largely stay wide, and Osvaldo Alonso is positioned centrally as a defensive midfielder behind the more offensive Brad Evans. It's worth noting that Mauro Rosales didn't last very long in this game after his knee was re-injured, and yet his few events were still enough to locate him accurately in the formation.
Something of a surprise (at least to me) is that forwards Fredy Montero and Mike Fucito are paired side-by-side. We generally think of Montero as a more withdrawn forward (or a trequartista) behind an advanced target man. But Fucito's an unlikely target man given his stature and speed. Here's another example, from the first match of the season against the Los Angeles Galaxy:
Now we see Montero playing behind a more advanced O'Brian White, who closer fits the mold of a target man. This version also shows Steve Zakuani before his season-ending injury. It seems to suggest that he cuts into the middle often enough to make himself a central midfielder as much as a winger. But while it's true that he does cut into the middle frequently as an inverted winger, his central location here is somewhat due to the fact that he frequently switches wings during a game. Here's a view of the isolated Zakuani data from that match:
You can see some events in the middle of the field, but much of the central skew is due to a number of events on the complete opposite flank. That's a weakness of this model, in that it doesn't differentiate between a player moving within the current formation and a the team moving into a different formation by switching the sides of the wing players or otherwise moving a player into a new position. That's something I hope to work on, but I expect we'll be incorporating these quick formation images into our game previews and recaps during the coming season.