Goalkeeper Sweeper Pass Score: Measuring how a sweeping action can contribute to progression

It’s been a few months now since I’ve started looking at data at a different way. Instead of focusing on the metrics that are generated, I’ve been looking at the raw data more and create my own metrics. Part of the reason of that is that I’m never completely satisfied with the metrics given by the different data providers. Especially, when it comes to goalkeeper’s data.

In this article I’ll take you on a journey on how you can use event data to create your own metric and develop a score that gives us an understanding, not only of performance but also of a role. I will explain the new data metric that I have created: Goalkeeper Sweeper Pass Score (GSPS).

What is GSPS?

Goalkeeper Sweeper Pass Score is a metric that I’ve created to see the importance of the next action of a sweeping action. I want to measure whether a sweeping action can lead to a successful pass, thus progressing the attacking with a defensive action.

The metric is measured on a scale from 0 to 100, and the higher the number the more actions you had AND the more successful passes were on the end of that action. These are measured in relation to other goalkeepers in my specific database.

Why do we need GSPS?

I’ve said this on different occasions, but I just think that we have so few metrics for goalkeepers and if we have, they are all related to shot-stopping quality. That’s of course a huge part of a goalkeeper’s game, but progressing the game is that too. That’s why I wanted to see whether a specific role of goalkeeping (the sweeper keeper) could contribute to attack with those sweeping actions, as well as that I could capture that specific action in a data metric.

This will help us measure how important the sweeper actions are in relation to the progression of the game and whether goalkeepers actively engage in those actions or not.

Data provider

The data used to create the metric is all from Opta. It’s not from FBRef, although it’s a very good source. The data comes from the event data or x-y data that is from Opta. That is the starting point for everything in this data metric.

This is what the event data looks like. From this database, I will calculate what I need: the sweeper events, the next action from that sweeper event and the success of that action.

I think in general it’s good to know that yes there are many metrics ready to use from sources like Wyscout, FBRef, Statsbomb and Opta for example — but having access to event data gives you a lot of freedom to make your own things. And, that’s exactly what I have been doing.

Data explainer

So, we got the provider we need. The next step is to look at the data and select the metrics we want to use. We want to look at what danger the delivery can give us and there are a few metrics that I want to generate for this analysis:

  • The sweeper actions: in the data, you can filter for sweeper actions which will only select goalkeepers
  • Pass locations: The locations of the sweeper actions which are passes
  • Passes: both the quantity of the passes as well as the quality: success rate.

Methodology

So how do we go from raw positional data to a score? There are four important steps to take in which many things need to be thought of, otherwise, it won’t grasp what we are trying to do.

The first step is to generate what we need from the event data to make metrics. First of all, we need to make sure everything is filtered for sweeping actions. We also need to calculate the next action after that sweeping action + the passes. When we have done that we have all the metrics we need.

The second step is to grab the metrics that we use and put them into the same kind of variables so we can calculate a score. This is needed because every variable has its different numerical value.

To create a scorethat goes from 0–1 or 0–100, I have to make sure all the variables are of the same type of value. In this, I was looking for ways to do that and figured mathematical deviation would be best. Often we we think about percentile ranks, but this isn’t the best in terms of what we are looking for because we don’t want outliers to have a big effect on total numbers.

I’ve taken z-scores because I think seeing how a player is compared to the mean instead of the average will help us better in processing the quality of said player and it gives a good tool to get every data metric in the right numerical outlet to calculate our score later on.

Z-scores vs other scores. Source: Wikipedia

We are looking for the mean, which is 0 and the deviations to the negative are players that score under the mean and the deviations are players that score above the mean. The latter are the players we are going to focus on in terms of wanting to see the quality. By calculating the z-scores for every metric, we have a solid ground to calculate our score.

The third step is to calculate the GSPS.

We talk about harmonic, arithmetic and geometric means when looking to create a score, but what are they?

The difference between Arithmetic mean, Geometric mean and Harmonic Mean

As Ben describes, harmonic and arithmetic means are a good way of calculating an average mean for the metrics I’m using.

So there are two different options for me. I either use filters and choose the arithmetic mean as that’s the best way to do it, or I need to alter my complete calculation to find the mean. In this case, I’ve chosen to filter and then create the arithmetic mean. I’ve converted the different metrics: sweeper actions, passes and successful passes to z-scores and then calculated the mean.

Using the arithmetic mean now, will lead to what I want to get out of it: a score from 0 to 100 that gives the level of progression from sweeping actions.

Example: J1 League

In the table above you can see the score of the goalkeepers who are most successful from sweeping passes. The score goes from 0–100 based on the goalkeeper that actually engage in sweeping actions, with 100 being the score where the player scores highest in every metric included in the score.

These players create the highest success from sweeping passes and thus create most progression from a sweeping action as goalkeeper. This also paints the picture that they have the foresight to look ahead and move the ball forward.

  • Using AutoRegressive Integrated Moving Average (ARIMA) to predict future shot locations for Liverpool in Premier League
  • Progressive Long Pass Score: giving meaning to a long pass from the start location
  • Throw-in success: generating shots through emphasis on throw-in routines
  • Actionable analysis: Individual Header Rating (IHR) determines choices in blockers vs runners
  • The complexity of outliers in data scouting in football
  • Four things to pay attention to when you start analysing corners