In the first of our views from Academia series, I was joined by Laurie Shaw.
We've included a shortened word-version below for those who haven't jumped on the video movement!:
Laurie studied Astrophysics at Cambridge University and has enjoyed a diverse career working as a quant researcher at Winton Capital, Head of Statistics and Model Development in the Fiscal Policy group at the UK Treasury and most recently as a research fellow with the Sports Analytics Lab in the Statistics Department at Harvard University.
How established is data science within the football (soccer) world?
"Up until about the mid-90s the statistics available to analyze a match were limited to about 10 numbers".
You had the scoreline, the number of shots, yellow and red cards etc. In the mid-90s companies like OPTA started collecting event-based data. Event data consists of a manual tagging of every on-ball event that took place during the game: each pass, shot, interception.
This data is now being collected routinely in leagues around the world by several different data providers. Once this got into the hands of people who had an interest in analyzing data, it spurred the growth of analytics in football. Many of those people were in the amateur blogging community. Many of the established concepts in football analytics had their origin in work done by people in their spare time because they love the game. Many of these people are now working in data science teams at major clubs.
You published a paper recently on measuring and classifying team formation - can you give a quick overview of the paper and methodology?
The motivation for this paper was to use data to try to quantify and evaluate team strategy. This project was made possible because of the advent of tracking data. In the last 10-15 years...
"The data has been supplemented by Optical tracking data or tracking data by other means sometimes GPS, which gives you samples of the positions of all the players on the field 10 to 25 times a second throughout the match
as well as event data which tells you what's going on with the ball".
Tracking data, tells you where all the players are irrespective of whether they were interacting with the ball or not. And this allows you to start to look at things like team strategy in more detail; what were players doing during the majority of the game when they weren't on the ball, what did the formation look like in different phases of the game and so on.
From a sports trader perspective, can you get a live feed of this data?
I think there are companies working on providing it as quickly as they can and obviously there's an interest from clubs to be able to have that data stream directly to the bench during a game. The data that we're working with at Harvard is historical -- we don't have access to data as it's being collected.
Using this tracking data what were your findings on the formations?
The motivation for this paper was to use data to try to quantify and evaluate team strategy. By ‘team strategy’ I mean the set of instructions that a manager gives his players both as a team, a group and as individuals. This project was made possible because of the advent of tracking data.
"I developed a simple algorithm for measuring team formations and classifying them, considering different phases of the game separately".
First, I developed a simple algorithm for measuring team formations and classifying them, considering different phases of the game separately. Analysts often talk about the difference phases of possession: transition, consolidating possession, progressing the ball up the pitch, and creating attacking opportunities. Managers often talk about how they plan for those stages separately and the teams have different instructions for those phases and likewise when defending.
The methodology relies on this insight that teams will typically move around the field in a coherent structure that only encompasses a small fraction of the field. So formations are not defined by the absolute player positions on the field, they're defined by their relative positions to each other.
In practice, we track the vectors from one player to all the others and then averages them over time to gain an idea of what this sort of overall lattice structure or the formation of the team looks like in that period. And we can do this in different attacking and defensive phases, separately.
How big an impact does formation have on the probability of a result?
It's a question that people ask a lot and it's difficult to answer because you have to control for many things. When one team plays against another there are lots of factors that determine the result. How strong is one team relative to another? How much of a role does home advantage play? how important was the game? what result would either team settle for? So you have to control for all these things and after that maybe you can answer the question.
One thing we can clearly see in the data though is teams changing their formations to adapt to an opponent. So for example, you can see that one team might have a certain formation they use in the majority of games but then switches to a different formation when they are playing one of the top teams in the league, they know their opponent is particularly effective at playing in a certain way and try to counter it.
"One of the most interesting things is our algorithm which automatically detects changes of formation during a game".
When you look at over a larger number of matches, you can observe how certain managers react to situations. And what the different strategies they like to use in those situations. You can attempt to anticipate what a manager will do based on what they've done in the past.
If you are interested in finding out more on this topic you can visit Laurie’s blog at http://eightyfivepoints.blogspot.com/ or the Harvard Sports Analytics Lab https://sportsanalytics.stat.harvard.edu/