I promise you that at some point, I will stop creating new metrics or presenting you with player scores. But I just love them. The reason for that is that event data allows us to play around and make our own metrics. You can tailor the data to the needs you need, but you have to stay vigilant: it’s incredibly prejudiced and subjective. Data also has to deal with narratives and that’s something I hope to give you all if you find these articles interesting to read.
Sometimes I wonder what I really love to write about. And, in all honesty, I don’t usually write about the stuff that truly fascinates me. My mind goes to the audience — you — to discuss what I think will do well for my audience. That’s a good way of approaching the market. Attacking numbers or players always do well, but I think that the attacking pieces are plenty and those of defensive data aren’t. So, that’s why I want to look at data that truly fascinates me and I enjoy defensive data.
For this article, I’m going to have a look at long passes or long balls. The aim is to create a metric via a score that judges not the length of the passes or the end location, but rather looks at the starting location. I have always been fascinated with central defenders who dribble up the pitch or defensive midfielders who distribute from deep, so with this, we can measure how they contribute to their team’s attack.
Data
The data I’m going to use for this specific research is event data from Opta. This was collected from the 2024–2025 season and focuses on the Austrian Bundesliga. I won’t make any distinction for position here, because in terms of average position it will most likely be central defenders dribbling in, defensive midfielders dropping or fullbacks/wingbacks from the half spaces.
Event data will be manipulated so it can be used as match data. This means that will go from XY data to metrics, which is more usable in this kind of visualisation like tables and scatterplots.
Methodology
The aim is to go from event data to results that are totals or per 90 metrics. I have previously made metrics via Opta event data and using standard qualifiers to determine long passes. Which will be helpful.
So what’s a long pass? That’s the first question we have to ask ourselves in this regard. We can determine a long pass from the total passes when:
- A pass is a ground pass over 45 meter
- A pass is a high pass over 25 meter
From that we get a total number of passes based on those qualifiers. But what we are going to do next is to filter those passes. I can do that in the first step already, but since I already have that information I will do it after. I will determine the start location area to make sure I’m getting the right ones.
I will calculate that through Python where I put the event data through a set of calculations. Without limitations we get this:
Of course this is impossible to distinguish and doesn’t give a lot of meaning to what we are trying to achieve. We want to look at start locations that are in the middle third. From the start location in the middle third, we can see the total long passes in the Austrian Bundesliga so far.
In the image above we can clearly see all passes that fit the begin location, but what we also see is that passes have that particular start location, but not the progressive end location. Many of the passes go backwards and that’s not what we want, so we have to change that.
In the image above you can see the corrected version. Now all the passes go forward and beyond the half of the pitch. It already gives us a better idea of progressive long passes, but not quite yet. We still have to filter out unsuccessful passes.
Now we have all the passes that we need to make the metric and create the score to see which players are doing the best in this metric.
Progressive Long Pass Score
So first we have a look at progressive long passes and the success rate of those passes. This gives us an idea of how successful players are when conducting this kind of pass.
In this scatterplot we see the relation the total of progressive long balls and their successrate. I have filtered for players that 5 or more long balls to not skew the data when we are looking at metrics and scores. The next step is to convert this to a score.
We will look at the metrics and give new weights. The successrate is leading, but the weights of the total progressive long balls plays a part too. The success rate will become more important in the score when the volume of long balls is higher.
When we have created the code, we see the following score and rank for the Austrian Bundesliga in terms of Progressive Long Ball Score.
If you want to know more about the Python code, you can subscribe to my Patreon here:
https://www.patreon.com/c/outswingerfc?source=post_page—–52ffeb371fcc——————————–
I’ve included the data and python code to my Patreon 🙂