Fluidity in Football: Quantifying Relationism Through Spatial Data

My eyes haven’t been scrolling the social media platforms as much as I used to do a few years back. Partly because I don’t have the time anymore to do so in such frequency, and partly because many of my algorithms have become a cesspool of negativity and hate. Having said that, something I tend to follow is the way how teams play. And, I think, it comes as no surprise when I say that relationism as a style of play has been wandering around many feeds.

I’m not going to pretend I’m the expert on the coaching aspect of it and how to implement it. Neither do I think this article is going to bring forth groundbreaking results or theories. My aim with this article is to use event data to identify teams that are hybrid positional-relational or have a strong dominance of relationism in their style of play. Is it something that can only be captured by the eye and by a stroke of culture/emotion? Or can we use event data to recognise patterns and find teams that play that way?

Contents

  1. Why use event data to try and capture the playing style?
  2. Theoretical Framework
  3. Data & Methodology
  4. Results
  5. Final thoughts

Why use event data to try and capture the playing style?

Football, at its essence, is a living tapestry of player interactions constantly evolving around the central object of desire: the ball. As players move and respond to one another, distinct patterns emerge in their collective actions, particularly visible in the intricate networks formed through passing sequences.

Though traditional event data captures only moments when players directly engage with the ball, these touchpoints nonetheless reveal profound relational qualities. We can measure these qualities through various lenses: the diversity of passing choices (entropy), the formation of interconnected player clusters, and the spatial coordination that emerges as players position themselves in relation to teammates.

This approach to understanding football resonates deeply with relationist philosophy. From this perspective, the game’s meaning doesn’t reside in static positions or isolated actions, but rather in the dynamic, ever-shifting relationships between players as the match unfolds. What matters is not where individual players stand, but how they move and interact relative to one another, creating a fluid system of meaning that continuously transforms throughout the ninety minutes.

Theoretical Framework

Football style through a relationist lens isn’t defined by predetermined positions but emerges organically from player interactions. This approach, which is founded on spontaneity, spatial intelligence, and fluid connectivity, stands in contrast to positional play’s structured framework of designated zones and tactical discipline.

In relational systems, players coordinate through intuitive responses to teammates, opponents, and the ball’s context. The tactical framework materialises through the play itself rather than being imposed beforehand.

On the pitch, this manifests as continuously reforming passing triangles, compact and diverse passes, constant support near the ball, and freedom from positional constraints. Players gravitate toward the ball, creating local numerical advantages and dynamic combinations. Creative responsibility is distributed, shifting naturally with each possession phase, while team structure becomes fluid and contextual, adapting to the evolving match situation.

Analytically, traditional metrics like zone occupation or average positions presume stability and structure that relational play defies. Effective analysis requires shifting from static measurements to interaction-based indicators.

This research introduces metrics derived from event data corresponding to relational principles: clustering coefficients quantify local interaction density, pass entropy measures improvisational variety, and support the proximity index tracks teammate closeness to the ball, enabling dynamic identification of relational phases throughout matches.

Data and methodology

This study uses a quantitative methodology to identify and measure relational play in football through structured event data. The dataset includes match records from the 2024–25 Eredivisie Women’s league. Each event log contains details such as player and team identifiers, event type (e.g. pass, duel, shot), spatial coordinates (x and y values on a normalised 100×68 pitch), and timestamp. Only pass events are used in the analysis, since passing is the most frequent and structurally revealing action in football. The data comes from Opta/StatsPerform and was collected on May 1st, 2025, for the Dutch Eredivisie Women.

To capture long-term relational behaviour, each match is segmented into 45-minute windows. Each window is treated independently and analysed for signs of relational play using three custom-built metrics:

  1. Clustering Coefficient measures triangle formation frequency in passing networks, where players are nodes and passes are directed edges. A player’s coefficient is calculated by dividing their actual triangle involvement by their potential maximum. The team’s average value indicates local connectivity density—a fundamental characteristic of relational play.
  2. Pass Entropy quantifies passing variety. By calculating the probability distribution of each player’s passes to teammates, we derive their Shannon entropy score. Higher entropy indicates more diverse passing choices, reflecting improvisational play rather than predictable patterns. The team value averages individual entropies, excluding players with minimal passing involvement.
  3. Support Proximity Index evaluates teammate availability. For each pass, we count teammates within a 15-meter radius of the passer. The average across all passes reveals how consistently the team maintains close support around the ball—a defining principle of relational football that enables spontaneous combinations and fluid progression.

To combine these three metrics into one unified measure, we normalise each one using min-max scaling so they fall between 0 and 1. The resulting Relational Index (RI) is then calculated using the formula:

RI = 0.4 × Clustering + 0.3 × Proximity + 0.3 × Entropy

These weights reflect the greater theoretical importance of triangle-based interaction (clustering), followed by support around the ball and variability in pass choices.

A window is labelled as relational if its RI exceeds 0.5. For each team in each match, we compute the percentage of their 2-minute windows that meet this criterion. This gives us the team’s Relational Time Percentage, which acts as a proxy for how often the team plays relationally during a match. When averaged across multiple matches, this percentage becomes a stable tactical signature of that team’s playing style.

Results

Applying the relational framework to matches from the 2024–25 Eredivisie Women’s league revealed that relational play, as defined by the Relational Index (RI), occurs infrequently but measurably. Using 45-minute windows and a threshold of RI > 0.5, most teams displayed relational behaviour in less than 10% of total match time.

Across all matches analysed, the league-wide average was 8.3%, with few teams exceeding 15%. Based on these distributions, the study proposes classification thresholds: below 10% as “structured,” 10–25% as “relational tendencies,” and above 25% as “highly relational.” Visual inspections of high-RI segments showed dense passing networks, triangular combinations, and compact support near the ball, consistent with tactical descriptions of relational football.

In the bar chart above we can see the Eredivisie Women 2024-2025 teams and how relational their style of play is. It measures how much of the time they have played in relational principles. I have two thresholds:

  • 40% is the threshold for moderate relationism in football and those teams can be said to play relationism in their football or a hybrid style that favours relationism
  • 50% is highly relationism. From that percentage we can say a team is truly relationism in their style of play.

Now as you can see there are quite some teams that are moderate, but truly relationism is only played – according to this data – by Ajax and Feyenoord.

As you can see this in this violin plot, most of the time are in the moderate threshold, meaning that they have tendencies of relationism in their play, but not fully there. Now, if we look at one team we can see something different on how they play throughout the season. We are going with FC Twente, which is the best team of the season and arguably the best team of the past decade.

This grouped bar chart visualises FC Twente’s average Relational Index in the first and second halves of each match, using a 0.5 threshold to indicate relational play. By comparing the two bars per match, we can see whether Twente sustains, increases, or declines in relational behavior after halftime. The visualisation reveals how tactical fluidity evolves throughout matches, highlighting consistency or contrast between halves. Matches where both bars are above 0.5 suggest sustained relational intent, while large gaps may indicate halftime adjustments or fatigue. This provides insight into Twente’s game management and stylistic adherence across different phases of play.

Final thoughts

This study demonstrates that relational football—a style characterised by adaptive coordination, dense passing, and ball-near support—can be meaningfully identified using structured event data. Through a composite Relational Index, short relational phases were detected across matches, though their overall frequency was low, suggesting such play is rare or context-dependent. The model proved sensitive to fluctuations in team behaviour, offering a new lens for analysing tactical identity and match dynamics.

However, limitations include reliance on on-ball data, which excludes off-ball positioning, and the use of fixed two-minute windows that may overlook brief relational episodes. Additionally, the index’s threshold and normalisation methods, while effective, introduce subjectivity and restrict cross-match comparison. The current framework also lacks contextual variables like scoreline or pressing intensity. Despite these constraints, the findings support the claim that relational football, though abstract, leaves identifiable statistical traces, offering a scalable method for tactical profiling and a foundation for future model refinement.

Interquartile Ranges and Boxplots in Football Analysis

By writing regularly, I have concluded that I like discussing data from a sporting perspective: explaining data methodology through the lens of sport, football in particular. I have always set out to work in professional football, and I am very lucky to have reached that, but I want to keep creating, and that is why my content has become increasingly about how we use data rather than what players/teams are good/bad.

I spoke about the importance of looking at players who act differently. Well, the data behaves differently: they are outside of the average or the mean. Previously, I have spoken about outliers and anomalies; those were result-based articles. But what if we zoom in, into the methodology and look at the way we calculate those outliers or anomalies? Today, I want to talk about Interquartile Ranges in football data.

Data collection

Before I look into that, I want to shed light on the data that I am using. The data I am using focuses on the Brazilian Serie A 2025. Of course, I know that it is very early in the season and has limitations. But we can still draw meaningful insights from them.

The data comes from Opta/StatsPerform and was collected on April 22nd, 2025. The xg data comes from my model, which is generated through R. The expected goals values were generated on April 26th, 2025.

Interquartile Ranges

The interquartile range (IQR) is a key measure of statistical dispersion that describes the spread of the central 50% of a dataset. It is calculated by subtracting the first quartile (Q1) from the third quartile (Q3):

IQR = Q3 − Q1

To understand this, consider that when a dataset is ordered from smallest to largest, Q1 represents the 25th percentile (the value below which 25% of the data falls), and Q3 represents the 75th percentile (the value below which 75% of the data falls). The IQR therefore, captures the range in which the “middle half” of the data lies, excluding the extreme 25% on either end.

Interquartile Range explainer

The IQR is widely used because it is resistant to outliers. Unlike the full range (maximum minus minimum), which can be skewed by one unusually high or low value, the IQR reflects the typical spread of the data. This makes it particularly useful in datasets where anomalies or extreme values are expected, such as football statistics, where a single match can significantly distort an average.

A small IQR indicates that the data is tightly clustered around the median, suggesting consistency or low variability. A large IQR implies more variation, indicating that values are more spread out. In data analysis, comparing IQRs across different groups helps identify where variability lies and whether certain segments are more stable or volatile than others.

Box plot

A boxplot (or box-and-whisker plot) is a compact, visual summary of a dataset’s distribution, built around five key statistics: the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It is one of the most efficient ways to display the central tendency, spread, and potential outliers in a single view.

At the core of a boxplot is the box, which spans from Q1 to Q3 — the interquartile range (IQR). This box contains the middle 50% of the data. A horizontal line inside the box represents the median (Q2), showing where the center of the data lies. The whiskers extend from the box to show the range of the data that falls within 1.5 times the IQR from Q1 and Q3. Any data points outside of that range are plotted as individual dots or asterisks, and are considered outliers.

Boxplots are particularly useful for comparing distributions across multiple categories or groups. In football analytics, for example, you can use boxplots to compare metrics like interceptions, shot accuracy, or pass completion rates across different player roles or leagues. This makes it easy to identify players who consistently perform above or below the norm, assess the spread of values, and detect skewness.

An important advantage of boxplots is their resistance to distortion by extreme values, thanks to their reliance on medians and quartiles rather than means and standard deviations. However, boxplots do not reveal the full shape of a distribution (e.g., multimodality or subtle clusters), so they are best used alongside other tools when deeper analysis is needed.

Analysis

As described under the data section, I will use expected goals data from the Brazilian Serie A 2025. Using interquartile ranges, we can see which players are in the middle 50% of the selected metric.

In short, this is what we can conclude: In the 2025 Brazilian league season, Pedro Raul stood out as the top player by expected goals (xG), showing his strong attacking threat. While there is a competitive cluster behind him, his advantage highlights his key role in creating high-quality scoring opportunities.

This shows us the top performers in expected goals accumulated of the begin of the season in Brazil. But if you want to delve deeper, you can look for outliers. We do that by using the interquartile range and finding the middle 50%. If there are deviations away from that middle 50%, we can can state that they are over-/underperforming. Or, in a more extreme form: they are outliers.

I’m quite interested in their distribution: do they have many shows of low xG-value? Or rather a few with high xG-values? I want to see whether they are part of outliers or that, in general just have more high xG-values per shot.

But how can we visualise that? By looking at box plots.

Each boxplot delineates the statistical spread of shot quality, with the median value indicating the central tendency of xG per attempt, while the interquartile range (IQR) represents the middle 50% of observations, effectively illustrating the consistency of shot selection.

The median xG value serves as a primary indicator of a player’s typical shot quality, with higher values suggesting systematic access to superior scoring opportunities, often from proximal locations to the goal or advantageous tactical positions. The width of the IQR provides insight into shot selection variability — narrower distributions indicate methodological consistency in opportunity type, while broader distributions suggest greater diversity in shot characteristics.

Final thoughts

Interquartile ranges and boxplots offer robust analytical tools for examining footballers’ shot quality distributions. These methods efficiently highlight the central 50% of data, filtering outliers whilst emphasising typical performance patterns.

Boxplot visualisations concisely present multiple statistical parameters — median values, quartile ranges, and outlier identification — enabling immediate cross-player comparison. This approach reveals crucial differences in shooting behaviours, including central tendency variations, distributional width differences, and asymmetric patterns that may reflect tactical specialisation.

Despite their utility, these visualisations possess inherent limitations. They necessarily obscure underlying distributional morphology and provide no indication of sample size adequacy — a critical consideration in sports analytics where performance metric reliability depends on observation volume. A player with minimal shot attempts may produce a boxplot visually similar to one with extensive data, despite significantly reduced statistical reliability.

Calculating triangular third-man runs with event data: how reliable are the results?

Data metrics and development always feel or look innovative. About 40% of the time, they are. It is very satisfying to create something that completely suits your needs, but one question always remains: how necessary is it to do so? My reason for it is that some metrics already exist, and it isn’t always necessary for your analysis to create everything yourself. It’s very time-consuming, and that’s not something we analysts often have.

To me, there is one clear exception: you don’t have the right resources. Often, we can look at event data and create our metrics, indexes and models from there. However, sometimes we want to look at off-ball data and we need tracking data to generate positional data and off-ball runs, for example. I’m quite fortunate to work with clubs that have access to that data, but what if you don’t have that? You want to use the event data to be creative and give an indication of what is close to tracking data.

This might sound familiar to you, reading this or the concept from it. We do love pressing data, but not all providers have that, so we have a metric that measures pressing intensity: Passes Per Defensive Action (PPDA). It calculates the number of passes allowed by a team’s opponent before the pressing team makes a defensive action (like a tackle, interception, or foul) in the opponent’s defensive two-thirds of the pitch. A low PPDA means aggressive pressing, and a lower PPDA means a more passive pressing approach.

In this article, I want to calculate a way to get third-man runs data with event data. After that, I want to evaluate how sound this method is and how reliable the results are.

Contents

  1. Why this research?
  2. Data collection
  3. Methodology
  4. Calculation
  5. Analysis
  6. Checks and evaluation

Why this research

This research is twofold, actually. My main aim is to see if I can find a creative way to capture third man runs with event data. I think it will give us some off-ball data via an out-of-the-box thinking pattern. Then we can use this in terms of off-ball scouting, analysis of players and teams, and of course, we can go even further and make models of that that predict a third-man run.

The second reason is that I feel that I can do better with evaluating the things I make. Check myself, check the validity of the models and the representation of the data. If I build this more into my public work, it will lead to a better understanding of the data engineering process, which is often frustrating and takes a long time to get right.

Data collection

The data collection is almost the same as in all other projects. The data I’m using is even data from Opta/Statsperform, which means it contains the XY-values of every on-ball data as collected by this specific provider.

The data has been collected on Saturday, 12th of April 2025. The data concentrates on one league specifically, and that is the Argentinian League 2025, as I’m branching out to South American football content only on this platform.

What I also will aim to do is to create a few new metrics, which will then be my own metrics. These are my designs based on Opta/Statsperform data and will be available on my GitHub page.

Methodology

Third man runs are one of football’s most elegant attacking patterns — subtle, intelligent, and often decisive. At their core, these movements involve three players: the first (Player A) plays a pass into a teammate (Player B), who quickly connects with a third player (Player C) arriving from a different area. It’s C who truly benefits, receiving the ball in space created by the initial pass-and-move. This movement structure mirrors the concept of triadic closure from network theory, where three connected nodes form a triangle — a configuration known to create stability and flow in complex systems. In football, this triangle is a strategic weapon: it preserves possession while simultaneously generating overloads and disorganizing defensive shapes.

To detect these patterns in event data, we treat each pass as a directed edge in a dynamic graph, with time providing the sequence. The detection algorithm follows a constrained path: A→B followed by B→C, where A, B, and C are distinct teammates. The off-ball movement from A’s pass to C’s reception is measured using Euclidean distance — a direct application of spatial geometry to quantify run intensity. But there’s more at play than just movement. From an information-theoretic perspective, third man runs are low-probability, high-reward decisions. Using Shannon entropy, we can frame each player’s passing options as a probability distribution: the higher the entropy, the more unpredictable and creative the decision. Third man runs often emerge in moments of lower entropy, where players bypass obvious choices in favor of coordinated, rehearsed sequences.

Over time, these passing sequences can be modeled as a Markov chain, where each player’s action (state) depends only on the previous action in the chain. While simple possession patterns often result in high state recurrence (e.g., passing back and forth), third man runs introduce state transitions that break the chain’s memoryless monotony. This injects volatility and forward momentum into the system — qualities typically associated with higher goal probability. By combining network topology, geometric analysis, and probabilistic modeling, we build not just a detector but a lens into one of football’s most intelligent tactical tools. And with a value-scoring mechanism grounded in normalization and vector calculus, we begin to quantify what coaches have always known: the most dangerous player is often the one who never touched the ball until the moment it mattered.

Calculations

Let’s walk through how we identify and evaluate a third-man run using real match data. Imagine Player A, a midfielder, passes the ball from position (37.5, 20.2) on the pitch. Player B receives it and quickly lays it off to Player C, who arrives in space and receives the ball at (71.8, 40.3). By checking the event timestamps, we know this sequence happened over 35 seconds — enough time for Player C to make a significant off-ball movement. The key here is that Player A never passed directly to Player C; the move relies on coordination and timing, a perfect example of a third man run.

We start by calculating the Euclidean distance between A’s pass location and C’s reception point, which comes out to about 39.75 meters. Dividing that by the time difference gives us a run speed of 1.14 meters per second. We also look at how much progress the ball made toward the goal, called the vertical gain, which in this case is around 0.33 (when normalized to a standard 105-meter pitch). Each of these factors — distance, vertical gain, and speed — is normalized and plugged into a weighted scoring formula. For this example, the resulting value score is 0.468, indicating a moderately valuable third man run.

This process helps quantify the kind of off-ball intelligence that often goes unnoticed. Instead of just knowing who passed to whom, we begin to understand how space is created and exploited. With just a few lines of code and some well-defined logic, we turn tactical nuance into measurable insight — bridging the gap between data and the game’s deeper layers.

Analysis

By running the Python code, I get the results of the third-man runs and every player involved. I also calculated a score that gives value to the third man run as to how useful it is. The Excel file that came out of it looks like this:

First, let us have a look at the teams with the most third-man runs during these games we have collected:

As we can conclude from the bar graph above, Argentinos Juniors, Estudiantes and Boca Juniors have the most third-man runs. Having said that, Newell’s Old Boys, Belgrano and Deportivo Riestra have the fewest third man runs.

Of course, these are just volume numbers. Let’s compare them to the average value:

What we see is quite interesting. The majority of the teams have an average value between 0,3 and 0,5 per third-man runs. The more they participate in third man runs, the more the value is even. As you can see, the teams with the fewest have more positive and negative anomalies.

Finally, I want to find the players who are player C, as they are the ones doing the third-man run. I also want to see their value score on average, so we can see how dangerous their runs are.

As we can see, there is a similar pattern with how the value is assigned. More third man runs also means a more similar value to their peers. This is very interesting.

We can see the best players in terms of the value they give with their runs. I have selected only players with at least 5 runs to make it more representative. It’s also important to stress that the league hasn’t been concluded and the data will be different when the season is over.

Now this is a way how we create the metric and analyse it, but how valid is this data actually?

Checks and evaluation

Now I’m going to check whether this is all worth the calculation or that I need to make drastic changes

First, I will look at the feature. The composite score is based on the following: distance (50%), Vertical gain toward the goal (30%) and speed (20%). This seems like a reasonable way to use weights, but I could increase the vertical gain for the values.

Second, looking at the output, I need to ask myself the following questions:

  • Are top-scoring actions meaningful? (e.g. fast break runs, line-breaking passes)
  • Are short, back-passes scoring low?
  • Are there any nonsensical values like 0 distance or negative speed?

Following that, looking at the distribution validation:

The distribution of normalized value scores for third-man runs appears healthy overall, showing a positive skew with most scores clustered between 0.2 and 0.5. This shape aligns with expectations in football, where high-value tactical patterns — like truly effective third-man runs — should be relatively rare. The presence of a single peak at 1.0 suggests a standout moment, potentially representing a high-speed, long-distance run that created significant forward momentum. Importantly, the model avoids overinflating value, as there’s no unnatural cluster near the top end of the scale, which supports its reliability in distinguishing impactful actions from routine ones.

However, the tight cluster around the 0.2–0.3 range may point to a limited scoring spread, potentially reducing how well the metric separates moderately good actions from low-value ones. This could be a result of either low variability in the input features (like speed or vertical gain) or an overemphasis on distance within the weighted formula. If the score distribution stays compressed, it may make ranking or comparative use of the metric less insightful. Adjusting the weighting or introducing non-linear scaling to amplify score separation could help refine the index for better tactical and scouting utility.

If you liked reading this, you can always follow me on X and BlueSky.

Finding overperforming strikers using contextual anomalies

I was reminiscing early this week about the days that I just spent hours in Tableau with Wyscout data and making scatterplots for the life of it. I think Twitter specifically saw more scatterplots than ever before from football data enthusiasts. Then, it made me think: how did we try to find the best performing players?

Immediately, my mind turns toward outliers. It’s a way to look for the players that stand out in certain metrics. One of my favourite pieces on outliers is this one by Andrew Rowlin:

Finding Unusual Football Players – update 2024 – numberstorm blog

Outlier detection for football player recruitment an update

andrewrowlinson.github.io

In this article, I will focus on a few things:

  1. What are outliers in data?
  2. Anomalies: contextual anomaly
  3. Data
  4. Methodology
  5. Exploratory data visualisation
  6. Final thoughts

Outliers

Outliers are data points that significantly deviate from most of the dataset, such that they are considerably distant from the central cluster of values. They can be caused by data variability, errors during experimentation or simply uncommon phenomena which occur naturally within a set of data. Statistical measures based on interquartile range (IQR) or deviations from the mean, i.e. standard deviation, are used to identify outliers.

In a dataset, an outlier is defined as one lying outside either 1.5 times IQR away from either the first quartile or third quartile, or three standard deviations away from the mean. These extreme figures may distort analysis and produce false statistical conclusions thereby affecting the accuracy of machine learning models.

Outliers require careful treatment since they can indicate important anomalies worth further investigation or simply result from collecting incorrect data. Depending on context, these can be eliminated, altered or algorithmically handled using certain techniques to minimize their effects. In sum, outliers form part of the crucial components used in data analysis requiring an accurate identification and proper handling to make sure results obtained are strong and dependable.

Anomalies

Anomalies in data refer to data points or patterns that deviate significantly from the expected or normal behavior, often signaling rare or exceptional occurrences. They are distinct from outliers in that anomalies often refer to unusual patterns that may not be isolated to a single data point but can represent a broader trend or event that warrants closer examination. These anomalies can arise due to various reasons, including rare events, changes in underlying systems, or flaws in data collection or processing methods.

Detecting anomalies is a critical task in many domains, as they can uncover important insights or highlight errors that could distort analysis. Statistical methods such as clustering, classification, or even machine learning techniques like anomaly detection algorithms are often employed to identify such deviations. In addition to traditional methods, time series analysis or unsupervised learning approaches can be used to detect shifts in patterns over time, further enhancing the detection of anomalies in dynamic datasets.

Anomalies are often indicators of something noteworthy, whether it’s a significant business event, a potential fraud case, a technical failure, or an unexpected change in behavior. Therefore, while they can sometimes represent data errors or noise that need to be cleaned or corrected, they can also reveal valuable insights if analysed properly. Just like outliers, anomalies require careful handling to ensure that they are properly addressed, whether that means investigating the cause, adjusting the data, or utilising algorithms designed to deal with them in the context of the larger dataset.

Contextual anomaly

There are three main types of anomalies: point anomalies, contextual anomalies, and collective anomalies. Point anomalies, also known as outliers, are individual data points that deviate significantly from the rest of the dataset, often signaling errors or rare events. Contextual anomalies are data points that are unusual in a specific context but may appear normal in another. These anomalies are context-dependent and are often seen in time-series data, where a value might be expected under certain conditions but not others. Collective anomalies, on the other hand, occur when a collection of related data points deviates collectively from the norm, even if individual points within the group do not seem anomalous on their own.

For my research, I will focus on contextual anomalies. A contextual anomaly is when a data point is unusual only in a specific context but may not be an outlier in a general sense. As we have already written and spoken about outliers, I will focus on contextual anomaly in this article.

Data

The data I’m using for this part of data scouting comes from Wyscout/Hudl. It was collected on March 23rd, 2025. It focuses on 127 leagues, of which three different seasons are featured: 2024, 2024–2025 and if there are enough minutes: 2025 season. The data is downloaded with all stats to have the most complete database.

I will filter for position as I’m only interested in strikers, so I will look at every player that has position = CF or has one of their positions as CF. Next to that, I will look at strikers who have played at least 500 minutes through the season, as that gives us a big enough sample over that particular season and a form of representative value of the data.

Methodology

Before we go into the actual calculation for what we are looking for, it’s important to get the right data from our database. First, we need to define the context and anomaly framework:

  • The context variable: In this example, we use xG per 90
  • The target variable: what do we want to know? Is whether a player overperforms or underperforms their xG, so we set Goals per 90 as the target variable
  • Contextual anomaly: When Goals per 90 > xG per 90 + threshold, but only when xG per 90 is low

How does this look in the code for Python, R and Julia?df_analysis[‘Anomaly Score’] = df_analysis[‘Goals per 90’] – df_analysis[‘xG per 90’]

# Define contextual anomaly threshold
anomaly_threshold = 0.25
low_xg_threshold = 0.2

# Flag contextual anomalies
anomalies = df_analysis[
(df_analysis[‘Anomaly Score’] > anomaly_threshold) &
(df_analysis[‘xG per 90’] < low_xg_threshold)
]
# Calculate Anomaly Score
df_analysis <- df_analysis %>%
mutate(Anomaly_Score = `Goals per 90` – `xG per 90`)

# Define thresholds
anomaly_threshold <- 0.25
low_xg_threshold <- 0.2

# Flag contextual anomalies
anomalies <- df_analysis %>%
filter(Anomaly_Score > anomaly_threshold & `xG per 90` < low_xg_threshold)
# Calculate Anomaly Score
df_analysis.Anomaly_Score = df_analysis.”Goals per 90″ .- df_analysis.”xG per 90″

# Define thresholds
anomaly_threshold = 0.25
low_xg_threshold = 0.2

# Flag contextual anomalies
anomalies = filter(row -> row.Anomaly_Score > anomaly_threshold && row.”xG per 90″ < low_xg_threshold, df_analysis)

What I do here is set the low xG threshold for 0,2. You can alter that of course, but I find that if you put it higher it will give much more positive anomalies than perhaps might be useful for your research.\

You can also do it statistically and work with z-scores. It will then ask you to give how many standard deviations from the mean will be classified as an anomaly. This is similar to my approach with outliers:

Using a standard deviation, we look at the best-scoring classic wingers in the Championship, using the standard deviation and comparing them to the age. The outliers are calculated as being +2 from the mean and are marked in red.

As we can see in our scatterplot, we see Carvalho, Swift, Keane and Vardy as outliers in our calculation for the goalscoring striker role score. They all score above +2 above the mean, and this is done with the calculation for Standard Deviation.

Okay, back to anomalies! Now we have our dataframe for all strikers in our database that have played at least 500 minutes with the labels: other players and anomalies. We save this to an Excel, JSON or CSV file — so it’s easier to work with.

Data visualisation

In essence an anomaly in this context is when a player has low(er) xG but has significantly higher Goals. We can show what this looks like through two visuals:

In the bar chart above, you can see the top 20 players based on anomaly index. John Kennedy for example, has the highest anomaly index, meaning that he scores highest on a high goal vs low xG ratio. This allows us to visualise what the top players are in this metric.

In this scatterplot you can find all players that are within our criteria. As you can see the blue dots are non-anomaly players and the orange ones are the anomaly players. What we can see here is that for a certain xG per 90 the anomalies have a higher Goal per 90.

In this way, we can have a look at how anomalies are interesting to track for scouting, but is important to ask yourself a critical question: how sustainable is their overperformance?

Final thoughts

Anomaly data scouting could be used more. It’s all about finding those players who really stand out from the crowd when it comes to their performance stats. By diving into sophisticated statistical models and getting a little help from machine learning, Scouts can spot these hidden gems — players who might be flying under the radar but are killing it.

This way of doing things gives clubs a solid foundation to make smarter decisions, helping them zero in on players who are either outperforming what folks expect or maybe just having a streak of good luck that won’t last. Anomaly detection is especially handy when it comes to scouting for new signings, figuring out who should be in the starting lineup, and even sizing up the competition. But don’t forget, context is key — stuff like the team’s playing style, strategies, and other outside factors really need to be taken into account along with the data.

At the end of the day, anomaly detection isn’t some magic wand that’ll solve everything, but it’s definitely a powerful tool.

Introducing the Probability Even Strength Index (PESI): how can we rank expected performances under even strength conditions?

I can’t be the only one who feels like this from time to time. I have hit the wall in terms of creating or developing completely new things. Whether it’s a new article or metric or model or a way to analyse games from an eyes perspective — we all get fatigued if we overexpose to the game. It’s fair to say that I have been feeling like this for quite a while actually. I love to innovate and challenge myself, but sometimes we need to ask ourselves: Do we need something completely new?

In that light, I have been looking at work that has already been done and used by so many (Expected goals) and tried to give my own twist to already existing research. 

What I want to do is to use this concept and work it out. I want to delve deeper into it and make some meaningful analysis based on data. My aim has two sides:

  1. I want to understand more about this kind of data concept and how we can work with it
  2. I want to see which teams perform best in even-strength conditions based on their shot data.

Data

The data I have used in this research is quite straightforward and in line with my other articles. The data comes from Opta/StatsPerform and is event data. This means that it looks at the xy-locations of on-ball events during specific games or seasons. The data was collected on March, 7th 2025, and any further updates haven’t been taken into account.

The expected goal data is generated through my own xG model and has been made through machine learning. From this specific model, I get xG, PsxG, xA, and PSxA. These are different and less detailed/complicated compared to the other providers out there, but the margin for error or difference is quite low, so I’m going to work with my own data here.

Mean Even Strength Football

In ice hockey, “even strength” refers to gameplay when both teams have the same number of skaters, typically five skaters plus a goaltender for each team (5-on-5). This is considered the standard playing condition during most of the game unless one or both teams are serving penalties, which leads to power play or penalty kill situations. Even strength is crucial for evaluating a team’s overall performance since it excludes the influence of special teams.

Several key statistical variables are associated with even strength play. One of the most important is ESG (Even Strength Goals), which measures the number of goals a team scores under these conditions. We can also look at this from an expected angle, focusing on the expected goals and expected assists under even strength standards.

With the selected data I have, I will calculate the filters I’m using:

  • Gamestate == Draw
  • xG
  • From RegularPlay

As you can see, I also selected from RegularPlay or open play. I have made sure to not look at set pieces. They are not nearly the same as power plays, of course, but they are not standard and can be infrequent as well, so that’s why I made that decision.

If we have that data, we are going to calculate the mean. What is the mean?

The mean is the mathematical average of a set of two or more numbers and is seen as the average. The mean provides a quick way to understand the “typical” value in a data set, but it can be sensitive to extreme outliers.

When we look at the MESF for Premier League 2024–2025, we can see who the top performers are for players:

The primary objective was to extract key performance metrics from the dataset, including expected goals (xG), post-shot expected goals (PsxG), and expected assisted goals (xAG), while structuring the data based on individual players and their respective teams.

Probability Even Strength Index

To make the analysis more accurate, the data was filtered based on game state and type of play. Only moments when the game was tied and classified as regular play were considered. This ensured that MESH calculations were based on stable, even-strength conditions. If no records matched these criteria, the entire dataset was used as a backup to prevent data loss.

The MESH Total Score was calculated by adding up all xG values that met these conditions. Additionally, the mean and standard deviation of xG were measured to get a sense of consistency. From these values, the coefficient of variation (CV) was determined, providing insight into how much a player’s xG fluctuated. To account for assists, the MESH Assists score was created by summing up xG values where a player was involved in setting up a goal under the same even-strength conditions.

To better capture a player’s overall impact, the MESH Index was introduced. This was calculated as the sum of a player’s MESH Total and MESH Assists, divided by their total xG and xAG, plus a tiny number (1e-6) to prevent errors from dividing by zero.

Finally, to summarise a player’s contribution in a single number, the MESH Strength Score was developed. This score uses a weighted formula that balances different aspects of performance, ensuring that goal-scoring, assists, consistency, and overall impact are all taken into account. The formula is:

where w1, w2, w3, w4, w5, and w6 are adjustable weights that allow fine-tuning of the score to reflect different performance characteristics.

ELO Rating System Implementation

To measure team performance over time, an ELO rating system was used, with updates based on the MESH Strength Score. The dataset was sorted by Date to make sure ratings were updated in the correct order. Every team started with a default ELO rating of 1500, which acted as the baseline for future changes.

On each match date, all teams that had recorded data were identified. If a team appeared for the first time, it was automatically given the starting 1500 rating. From there, ELO updates were made by comparing a team’s MESH Strength Score to the average score of its opponents. The expected performance score of each team was then calculated using the standard ELO formula:

In this formula, E_team represents the probability of a team performing better than its opponents, R_team is the team’s current ELO rating, and R_opponent is the average ELO rating of all opposing teams.

The actual score (S_team) was determined by comparing a team’s MESH Strength Score to the average score of its opponents. If a team’s MESH Strength Score was higher than the opponent average, it was given a score of 1. If it was the same, the score was 0.5, and if it was lower, the score was 0.

With both the expected and actual scores calculated, the ELO rating was updated using the formula:

where K=32 is the K-factor, which controls the rate of ELO adjustments. A higher K-value results in greater rating fluctuations, while a lower value stabilizes rankings over time.

Analysis

With our data on a team level, I want to see what the index does in terms of rating on an ELO-based ranking. The PESI is used for every match day, and we follow the trajectory of a team.

In the line graph above, we see all Premier League teams in the 2024–2025 season with their PESI per matchday. We have made our focus team Tottenham Hotspur red to see what the focus team is doing in comparison to the rest of the league. Tottenham started with an ELO ranking of 1484, and as of March 5th 2025, their ELO ranking is 1355.

Let’s have a look at Liverpool, the current leaders of the Premier League. How are they doing in even-strength situations?

In the line graph above, we see all Premier League teams in the 2024–2025 season with their PESI per matchday. We have made our focus team Liverpool blue to see what the focus team is doing in comparison to the rest of the league. Liverpool started with an ELO ranking of 1484, and as of March 5th 2025, their ELO ranking is 1366.

We see two different teams with the same start point who are quite similar in the way they look with their rating right now. What’s the actual ranking right now?

When you look at the table above, you can see that Arsenal has the highest ELO, followed by Southampton and Crystal Palace. The lowest are Brighton, West Ham United and Wolverhampton Wanderers. This doesn’t mean they are the best or the worst; it just means that they score high or low on creating/generating expected goals in even-strength situations.

Challenges

There are two big challenges for me, and something I need to address the next time:

  1. Even Strength is an interesting concept, but I am looking at two main variables: game state and regular play. Dismissing losing or winning state and set pieces and counter-attacks. However, this only looks at shots, I forgot to include even state for numerical even strength. It happens that red cards are given or teams have no more substitutions with injuries. That’s something you need to do in the next model.
  2. A vital part of the concept is that good-performing teams often lead and, therefore, are not in creating/generating xG from even strength situations. Bad teams have the same issue, as they are often in losing situations; this is something that needs addressing in the next mode,l too.

Final thoughts

This model blends MESH Strength Score and ELO ratings to track player and team performance in a more meaningful way. MESH goes beyond basic data like goals and assists, factoring in consistency and overall influence in different game situations. Meanwhile, ELO provides a way to rank teams dynamically based on how they perform against their opponents. Together, they create a more complete picture of performance rather than just relying on wins, losses, or raw numbers.

Since ELO adjusts based on competition strength, it also makes comparing teams over a season more reliable. This method could even be useful for forecasting match results based on past performances. In the future, Glicko-2 could be a strong alternative, as it builds on ELO by incorporating rating volatility, meaning teams with inconsistent performances would see their ratings fluctuate more, while stable teams would have steadier rankings. This could provide an even clearer picture of team strength and performance trends over time.

That said, it’s not without challenges. The accuracy depends on solid data, the right balance of weighted factors, and accounting for outside influences like injuries or tactics. With refinements, though, it could become a valuable tool for deeper football analysis.

Chess strategies in football: designing tactical passing styles based on chess ⚽️♟️

Football tactics is what originally swayed me in the direction of football analysis. I’m a sucker for patterns and causal relationship, sos when I was introduced to football tactics, I was completely emerged and sucked into the world of tactical analysis.

A few years later, I ran into a quite common problem: data vs. eyes. I have mostly focused on data lately, and that’s what I work with. In other words, right now, I work as a data engineer at a professional football club, where I don’t really focus on any aspect of video. The problem arises that for tactics, we mostly use our eyes because the data can’t tell us why something happens at a specific time on the pitch.

This got me thinking. Which sport has decision-making and tactical approaches, yet can be analysed with data? My eyes — pun intended — turned towards chess and the openings/defences you got there. In this article, I aim to translate, convert and surpass chess strategies and make them actionable for analysis.

This article will describe a few chess strategies and how we can look into them with data and create playing styles/tactics from them. The methodology will be a vital part of this analysis.

Contents

  1. Introduction: Why compare chess and football?
  2. Data representation and resources
  3. Chess strategies
  4. Defining the framework: converting chess into football tactics
  5. Methodology
  6. Analysis
  7. Challenges and difficulties
  8. Final thoughts

Introduction: Why compare chess and football?

First of all, I love comparisons with other disciplines. It gives us an idea of where we can improve from other sports and/or influence other sports. Until now, I have focused mostly on basketball and ice hockey, but there are other sports I want to have a look at: rugby, American football and baseball.

Truth be told, do I know a lot about chess? Probably not. I’m a fairly decent chess player, and I know the basic theory of strategies in chess, but you won’t find an ELO rating to be proud of you in my house. But that’s beside the point. The point is that I love chess for its tactics and strategies. It’s so prevalent in our language that we often call close-knit games: a game of chess.

My brain wanted to make a comparison between chess tactics and chess moves that can be seen in the light of certain actions and patterns in football. For that specific little research and to get the tactics, I am going to look at chess moves and relate them to passing in football. This will become slightly clearer in the rest of the article.

Data representation and sources

This remains a blog or website where I love to showcase data and what you can do with it. This article is no different.

The data I’m using for this is data gained from Opta/StatsPerform and focuses on the event data of the English Premier League 2024–2025. The data was collected on March 1st 2025.

We focus on all players and teams in the league, but for our chain events and scores, we focus on players that have played over 450 minutes or 5 games in the regular season. In that way we can make the data and, subsequently the results, more representative to work with.

Chess strategies

There are 11 strategies that I have found in chess that I found were suitable for analysis. Our analysis focuses on passing styles based on the chess strategies.

Caro-Kann Defense (Counterattacking, Defensive)

  • Moves: 1. e4 c6 2. d4 d5
  • A solid and resilient defense against 1. e4. Black delays piece development in favor of a strong pawn structure. Often leads to closed, positional battles where Black seeks long-term counterplay.

Scotch Game (Attacking)

  • Moves: 1. e4 e5 2. Nf3 Nc6 3. d4. White aggressively challenges the center early. Leads to open positions with fast piece development and tactical opportunities. White often aims for a kingside attack or central dominance.

Nimzo-Indian Defense (Counterattacking, Positional)

  • Moves: 1. d4 Nf6 2. c4 e6 3. Nc3 Bb4. Black immediately pins White’s knight on c3, controlling the center indirectly. Focuses on long-term strategic play rather than immediate counterattacks. Offers deep positional ideas, such as doubled pawns and bishop pair imbalances.

Sicilian Defense (Counterattacking, Attacking)

  • Moves: 1. e4 c5 Black avoids symmetrical pawn structures, leading to sharp, double-edged play. Common aggressive variations: Najdorf, Dragon, Sveshnikov, Scheveningen. White often plays Open Sicilian (2. Nf3 followed by d4) to create attacking chances.

King’s Indian Defense (Counterattacking)

  • Moves: 1. d4 Nf6 2. c4 g6 3. Nc3 Bg7. Black allows White to occupy the center with pawns, then strikes back with …e5 or …c5. Leads to sharp middle games where Black attacks on the kingside and White on the queenside.

Ruy-Lopez (Attacking, Positional)

  • Moves: 1. e4 e5 2. Nf3 Nc6 3. Bb5. White applies early pressure on Black’s knight, planning long-term positional gains. Leads to rich, strategic play with both attacking and defensive options. Popular variations: Closed Ruy-Lopez, Open Ruy-Lopez, Berlin Defense.

Queen’s Gambit (Attacking, Positional)

  • Moves: 1. d4 d5 2. c4. White offers a pawn to gain strong central control and initiative. If Black accepts (Queen’s Gambit Accepted), White gains rapid development. If Black declines (Queen’s Gambit Declined), a long-term strategic battle ensues.

French Defense (Defensive, Counterattacking)

  • Moves: 1. e4 e6. Black invites White to control the center but plans to challenge it with …d5. Often leads to closed, slow-paced games where maneuvering is key. White may attack on the kingside, while Black plays for counterplay in the center or queenside.

Alekhine Defense (Counterattacking)

  • Moves: 1. e4 Nf6. Black provokes White into overextending in the center, planning a counterattack. Leads to unbalanced positions with both positional and tactical play. It can transpose into hypermodern setups, where Black undermines White’s center.

GrĂźnfeld Defense (Counterattacking, Positional)

  • Moves: 1. d4 Nf6 2. c4 g6 3. Nc3 d5. Black allows White to build a strong center, then attacks it with pieces rather than pawns. Leads to open, sharp positions where Black seeks dynamic counterplay.

Pirc Defense (Defensive, Counterattacking)

  • Moves: 1. e4 d6 2. d4 Nf6 3. Nc3 g6. Black fianchettos the dark-square bishop, delaying direct confrontation in the center. Leads to flexible, maneuvering play, often followed by a counterattack. White can opt for aggressive setups like the Austrian Attack.

Defining the framework: converting chess into football tactics

A Caro-Kann Defense in chess is built on strong defensive structure and gradual counterplay, mirroring a possession-oriented style in football, where teams maintain the ball, carefully build their attacks, and avoid risky passes. On the other hand, the Scotch Game, an aggressive opening that prioritises rapid piece development and early control, aligns with high-tempo vertical passing. Teams using this style move the ball forward quickly, looking to exploit spaces between the lines and catch opponents off guard.

Some openings in chess focus on inviting pressure to counterattack, a principle widely used in football. The Sicilian Defense allows White to attack first, only for Black to strike back with powerful counterplay. This is akin to teams that play deep and absorb pressure before launching devastating transitions. Similarly, the King’s Indian Defense concedes space early before unleashing an aggressive kingside attack, where the team defended deep and then launched precise, rapid counterattacks.

Certain chess openings focus on compact positional play and indirect control, mirroring football teams that overload key areas of the pitch without necessarily dominating possession. The Nimzo-Indian Defense, for instance, does not immediately fight for central space but instead restricts the opponent’s development, where tight defensive structure and midfield control dictate the game. Likewise, the French Defense prioritizes a solid defensive structure and controlled build-up, where possession is carefully circulated before breaking forward.

Teams that thrive on wide play and overlapping full-backs resemble chess openings that emphasize control of the board’s edges. The Grünfeld Defense, which allows an opponent to take central space before striking from the flanks. In contrast, teams that bait opponents into pressing only to bypass them with quick passes follow the logic of the Alekhine Defense, which provokes aggressive moves from White and counters efficiently.

The flexibility of the Pirc Defense, an opening that adapts to an opponent’s approach before deciding on a course of action, can be likened to teams that switch between possession play and direct football depending on the game situation. The adaptability of this approach makes it unpredictable and difficult to counter.

Methodology

So now we have the strategies from chess that we are going to use for the analysis, and we know how they resemble similar football tactics already. The next step is to look at existing data and see how we can derive the appropriate metrics to design something new in football.

From the event data we need a few things to start the calculation:

  • Timestamps
  • x, y
  • playerId and playerName
  • team
  • typeId
  • outcome
  • KeyPass
  • endX, endY
  • passlength

First, individual match data is processed to extract essential passing attributes such as pass coordinates, pass lengths, and event types. For each pass, metrics like forward progression, lateral movement, and entries into the final third are computed, forming the building blocks of the tactical analysis.

Once these basic measures are derived, seven distinct metrics are calculated from the passing events: progressive passes, risk-taking passes, lateral switches, final third entries, passes made under pressure, high-value key passes, and crosses into the box. Each metric captures a specific aspect of passing behavior, reflecting how aggressively or defensively a team approaches building an attack.

For every tactical archetype — each modeled after a corresponding chess opening — a unique set of weights is assigned to these metrics. The overall strategy score for a team in a given match is then computed by multiplying each metric by its respective weight and summing the results. This weighted sum provides a single numerical value that encapsulates the team’s tendency towards a particular passing style.

Strategy Score=w1​(progressive)+w2​(risk-taking)+w3​(lateral switch)+w4​(final third)+w5​(under pressure)+w6​(high-value key)+w7​(crosses)

You can find the Python code here.

Analysis

Now, with that data we can do a lot of things. We can for example look at percentile ranks and see what the intent is of a team regarding a specific style:

We can see how well Liverpool performs in the strategies coming from chess, and that leads to a shot. This shows us that the French Defence and the Caro-Kann are two strategies wherein Liverpool scores the best in, in relation to the rest of the league.

The next thing we can do is to see how well Liverpool in this case does when we compare two different metrics/strategies.

In this scatterplot we look at two different strategies. The aim is to look at how well Liverpool scores in both metrics, but also to look at the correlation between the two metrics.

Liverpool perform in the high average for metrics, while Nottingham Forest and Manchester City are outliers for these two strategies — meaning that they create many shot above average from these two strategies.

Anothre way of visualising how well teams are creating shots from certain tactical styles adopted from chess, is a beeswarm plot.

In the visual above you can see the z-scores per strategy and where Liverpool is placed in the quantity of getting shots. As you can see they score around the mean or slight above/under the mean with small standard deviations. What’s important here is they don’t seem to be outliers in both the positive or negative way.

Challenges

  • Chess moves are clearly categorised, but football actions depend on multiple moving elements and real-time decisions.
  • Assigning numerical values to football tactics is complex because the same play can have different outcomes.
  • In football, opponents react dynamically, unlike in chess where the opponent’s possible responses are limited.
  • A single football tactic can have multiple different outcomes based on execution and opponent adaptation.
  • There is no single equivalent to chess move evaluation in football, as every play depends on multiple contextual factors.

Final thoughts

Chess and football might seem worlds apart — one is rigid and turn-based, the other a chaotic dance of movement and reaction. Chess moves have clear evaluations, while football tactics shift with every pass, press, and positioning change. Concepts like forks and gambits exist in spirit but lack the structured predictability that chess offers. And while chess follows a finite game tree, football is a web of endless possibilities, shaped by human intuition and external forces.

Bridging this gap means bringing structure to football’s fluidity. Value-based models like Expected Possession Value (EPV), VAEP, and Expected Threat (xT) can quantify decisions much like a chess engine evaluates moves. Reinforcement learning and tactical decision trees add another layer, helping teams optimize play in real-time. Football will never be as predictable as chess, but with the right models, it can become more strategic, measurable, and refined — a game of decisions, not just moments.

Space control and occupation metrics with average positions

Space and zonal control. These words and concepts are used quite often when we talk about football in a tactical sense. How do we control the spaces, zones and areas on the pitch and ensure we dominate in the game? We can capture These very interesting questions on video with telestration programs.

However, how do we make sure that we can capture this with data? The most obvious solution to that is to have a look at tracking data — and believe you me, I should write about tracking data more often — but not everyone has the opportunity to use tracking data. Furthermore, out-of-possession data is not as prevalent as we would like.

In this article, I want to use on-ball event data to create new metrics for space control and space occupation. I will do that by focusing on average positions of players while on the ball during a specific game or season.

Data

The data I’m using for this research comes from event data that is raw. That comes from Opta/StatsPerform, but this kind of data can be used from Hudl/StatsBomb, SkillCorner or any other provider that has event data.

The data was collected on Friday 14 February 2025 and focuses on Ligue 1 2024–2025. While the metrics can be created on an individual player level, I will keep my attention on the team level as it can give us some interesting insights.

Methodology

As we want to look at space control and we want to look at on-ball data, we need to make sure that we have a methodology that works for this. I honestly had to find a way that would work the best, or fail the least. First I started with looking at bins.

Yes, I’m very much aware this bin doesn’t overlay the pitch properly. However, this shows the zone control from Team A (blues) and Team B (reds). This control is based on all x and y variables and handles all on-ball touches, which might not necessarily mean that all variables are relevant.

Then I moved over to visualising average player positions.

Still, I wasn’t very convinced with how this visual looked and how it gave me control or occupation of space. There are two reasons for that:

  • Some areas are more purple, but dont’ really show if that’s a mixed-control zone or not
  • This plots all average positions for all players featured in a match. All players are needed for the total control, but without making a distinction for substitutions, it could lead to overcrowding and misleading data.

I like the idea of average positions though and I kept going back to a post I earlier wrote about Off-Ball Impact Score (OBIS)

In passing networks, the average positions of the network are calculated at the beginlocation of a pass: where does the pass begin. In terms of that, they calculate average positions where passes are made in that specific match or season. Also it makes sure that it focuses until the first substitution.

It gives us a good idea of where passes were made during that specific game on an average for specific players, as you can see in the image below.

Passing Network Heerenveen and Ajax with Expected Possession Value (EPV, Outswinger FC 2025

What if you used that logic of passing networks and calculate new things from those networks? Of course, we have done that already a little with OBIS:

  • In-degree centrality: The total weight of incoming edges to a player (i.e., the number of passes they received).
  • Out-degree centrality: The total weight of outgoing edges from a player (i.e., the number of passes they made).
  • Betweenness centrality: Measures how often a player lies on the shortest path between other players in the network.
  • Closeness centrality: The average shortest path from a player to all other players in the network.
  • Eigenvector centrality: Measures the influence of a player in the network, taking into account not just the number of connections they have but also the importance of the players they are connected to
  • Clustering coefficient: Measures the tendency of a player to be part of passing triangles or localized groups (i.e., whether their connections form closed loops).

These measure many things, but I want to focus more on control and occupation than pure off-ball impact.

In addition to the already calculated metrics, I wanted to pose some new metrics which we can calculate with average positions based on the begin location of a pass:

  • Team Control (%): The percentage of the field controlled by each team.
  • Overlap (%): The percentage of the field that is controlled by both teams.
  • Convex Hull Area: The area of the convex hull for the team (shows how compact the team is).
  • Vertical Compactness: The range (peak-to-peak) of player positions in the y (vertical) direction.
  • Horizontal Compactness: The range (peak-to-peak) of player positions in the x (horizontal) direction.
  • Player Density: The average number of players per unit area on the field.
  • Centroid X and Y: The average position (center of mass) of the team’s players.
  • Horizontal Spread: The maximum distance between players in the horizontal direction.
  • Vertical Spread: The maximum distance between players in the vertical direction.
  • Circularity: A measure of the team’s shape, with 1 being a perfect circle (indicating high compactness).

With these new metrics, we can create some new insights into space control and occupation in individual games, seasons and for individual players’ analysis.

Analysis

When we continue with our metrics, we can look at it from two ways. The first one is to look at it from an individual game perspective:

I have had a look at the game between PSG and Monaco, after which I calculated these metrics for both teams.

  • PSG had 40,06% control of the pitch, while Monaco had 40,85% control of the pitch
  • Overlap is the same in percentages
  • PSG had Convex Hull Arae of 2261,02 and Monaco of 1814,81. This means that Monaco had a smaller playing field than PSG in the game.
  • Vertical and Horizontal compactness, we see that PSV is more compact in a vertical sense, while Monaco is more compact in a horizontal sense
  • In terms of player density, there are more players per area for PSG than for Monaco
  • In terms of circularity, the high compactness is higher for PSG than for Monaco.

This is for a single game, but we can also look at all teams and focus on a complete season. We can compare them in serval ways, but the first one is via percentile ranks:

Here you can see how PSG scores in terms of all the metrics we just calculated compared to the rest of Ligue 1. what’s interesting is how high they score in player density and horizontal spread.

And this is Monaco’s percentile rank calculated. While PSG has more outliers in the high and low regions, Monaco is much more steady and consistently scores above average for every metric. What’s interesting here is that they score highest for percentages of pitch control during the games they played.

Final thoughts

Integrating average position metrics with passing data offers a deeper understanding of a team’s playing style and tactical approach. By mapping the average positions of players during a match, it becomes easier to identify areas of strength and vulnerability in possession. Teams that focus on short, quick passes tend to have more compact positioning with high-density zones in central areas, promoting controlled build-up play.

However, it does feel unfinished or incomplete. This is because we only look at the locations of passes and there are so many more types of touches to be considered. That’s something to alter for version 2.0.

From Ice Hockey to Football: Mean Even Strength (MESF)

From time to time I like to look at other sports in the world and see what we — football enthusiasts — can learn from other disciplines in elite sport. By doing that we can learn about innovative ideas that we can transition into football, but also to recognise that some models related to data, are already working very well.

In December 2024, I wrote about average attention draw and defensive entropy, defensive models and metrics from Basketball.

In this article, however, I will look at a different sport: Ice Hockey. And, yes I call it Ice Hockey, because where I live just hockey is called what’s officially field hockey. Are you with me? — Anyway, I want to use Mean Even Strength in Hockey (MESH) to see how we can translate, emulate and perhaps improve towards football.

Mean Even Strength in Hockey (MESH)

In hockey, “even strength” refers to gameplay when both teams have the same number of skaters on the ice, typically five skaters plus a goaltender for each team (5-on-5). This is considered the standard playing condition during most of the game unless one or both teams are serving penalties, which leads to power play or penalty kill situations. Even strength is crucial for evaluating a team’s overall performance since it excludes the influence of special teams.

Several key statistical variables are associated with even strength play. One of the most important is ESG (Even Strength Goals), which measures the number of goals a team scores under these conditions. ESGF (Even Strength Goals For) tracks goals scored by a specific team, while ESGA (Even Strength Goals Against) counts goals allowed by that team at even strength.

Data collection

Before we are going to look at how we can transform the data and metric into football analysis, let’s have a look at the data we are going to need in this specific little research.

We are using shot data for this particular article. The data comes partly from Opta and partly from my own model and was collected on Thursday, February 6th, 2025. The data focuses on La Liga 2024–2025 with emphasis on players who have played in over 5 games, to make it more representative.

The expected goals model are of my own making and are therefore different from other data providers such as Opta, StatsBomb, Wyscout or else.

Translating data

Football of course, doesn’t have power play so we have to find something that we can use as an even strength state. I have chosen to opt for gamestate. The gamestate can either be winning, drawing or losing for the specific team we are looking for or player we are looking at. By doing so, we have an even strength based on gamestate, when we use draw.

We can look at different sort of data, but I’m going to look at expected goals data because I want to focus on shooting. You can focus on other metrics as well in the same gamestate, but for me this is what I sought out to do.

Methodology

With the selected data I have I will calculate for the filters I’m using:

  • Gamestate == Draw
  • xG
  • From RegularPlay

As you can see I also selected from RegularPlay or open play. I have made sure to not look at set pieces. They are not nearly the same as power plays of course, but they are not standard and can be infrequent as well, so that’s why I made that decision.

If we have that data, we are going to calculate the mean. What is the mean?

The mean is the mathematical average of a set of two or more numbers and be seen as the average. The mean provides a quick way to understand the “typical” value in a data set, but it can be sensitive to extreme outliers.

I calculate the mean of different variables in Python and Julia, which you can find on my GitHub. I then get the average xG and PsxG for the players and teams in La Liga under the event strength. The new metric is called Mean Even Strength in Football (MESF).

Analysis

As you can see in the bar graph above, I have the mean strength value for the xG per shot in La Liga 2024–2025 so far. We can see a few interesting things:

  • Barcelona has the highest xG
  • Sevilla and Getafe have the lowest xG
  • The most common value is 0,11 xG per shot (6 times)
  • Real Madrid surprisingly has 0,10 xG per shot

When we look at the players we can see a few interesting things too for xG per shot for MESF values:

  • Hugo Duro, T. Douvikas and Borja Iglesias all score above 0,3 xG
  • After KoundĂŠ, the xG stabilises, so the top 4 are outliers

Final thoughts

Even strength can show us a more even playing field — literally — and can measure how impactful a player or team is when there are even variables. That leads us to that players are less likely to benefit from more difficult or easier situations.

In the future I will look at the even strength and expand it to other areas of play like passing.

Introducing Expected Shots from Cross (xCross): measuring the probability that a shot occurs from a cross

How many more expected models do we need? That’s surely a question I have asked myself numerous times while researching the article I am presenting today. I think it all depends on the angles you present your research with and how you approach the research: what’s the aim and what do you want to get out of it?

For me, it’s important to create something that adds something to a conversation about expected value models when I or others make a different model. This can be done by creating a completely new model or metric or recreating a model with enhancements. A combination of the two is also possible, of course.

In this article, I want to talk a little bit more about crosses. It is not so much about cross delivery or cross completion percentages, but what successful follow-up action does entail: shots from crosses and the expected models.

First I will talk about the data and how I have collected it, secondly about the methodology, followed by the analysis of the data and last I will give my final thoughts or conclusion.

Data

For this research, I have used raw event data from Opta/Statsperform. The event data was collected on Thursday, 30 January 2025, focusing on the 2024–2025 season of the Belgian Pro League, the first tier in Belgium football.

The data will be manipulated to have the metrics we need to make this calculation work. The players featured will have played a minimum of 500 minutes played throughout the season.

For this research, we won’t focus on any expected goal metrics, as we are not looking for the probability of a goal being scored.

Why this metric?

This is a question I ask myself every time when I set out to make a new model of metric. And sometimes I really don’t know what to answer. The fundamental question remains: do we need it? I guess that’s a question of semantics, but no — I don’t think we need it. However, I believe it can give us some interesting insights into how shots come to be.

The place where I come from is to understand what expected assists or expected goals assisted tell us. These metrics tell us something about passes with a probability of leading to a goal with the key difference between all passes and passes leading to shots. I love the idea of it, but it’s very much focused on the outcome of the shots and expected goals.

I want to do something different. Yes, the outcome will be a probability, but it focuses on the probability of a shot being taken rather than a shot ending up a goal. Furthermore, I want to look at the qualitative nature of the crosses and whethter we can asses something from the delivery taker. In other words, does the quality of the cross lead to more or less shots in similar variability.

Cross: a definition

What is a cross? If we look at the definition of Hudl, we can say the following what constituates as a cross: A ball played from the offensive flanks aimed towards a teammate in the area in front of the opponent’s goal.

In this instance a flank is the utmost 23 meters in a 68 meter wide pitch. This means that everything on the right or left that is a pass from the flank to the central area, can be considered a cross.

Credit: Glossary Wyscout

As we look to Opta/Statsperform data and are using that in our research, let’s see what their definition is: A ball played from a wide position targeting a teammate(s) in a central area within proximity to the Goal. The delivery must have an element of lateral movement from a wider position to more central area in front of Goal.

If we take some random data for crosses, we can see where crosses come from and what metrics we can pull from them. This is essential for our research into a model and we need to understand what we are working with.

In the image below you can see a pitch map with crosses visualised.

We visualise the crosses coming from open play, so we filter out the set pieces and we make a distinction between successful and unsuccessful passes. On the right side we see some calculation of the crosses which we can also use to work more with. EPV is expected possession value and xT is expected threat.

Methodology

So the thing is that I want to create a model from the crosses we visualised. The aim is to get a model that calculates probability for every cross turning into a shot. There are a few things we need to take from the event data:

  • Cross origin (location on the field)
  • Receiving player’s position (inside/outside the box, near/far post, penalty spot based on endlocation of the cross)
  • Game context (match minute, scoreline, opposition quality)

The first step involves organising the dataset by sorting events chronologically using timestamps. Then, the model identifies cross attempts (Cross == 1) and assigns a binary target variable (leads_to_shot) by checking if the next three recorded events include a shot attempt (typeId in [13, 14, 15, 16]). This ensures that the model captures sequences where a cross directly results in a shot, preventing the influence of unrelated play sequences. These include a missed shot, shot on the post, shot saved or a goal.

After defining the target variable, feature engineering is applied to improve model performance. Several factors influence the probability of a cross leading to a shot, such as the location of the cross (x, y), its target area (endX, endY), and the total time elapsed in the match (totalTime).

The dataset is then split into training (80%) and testing (20%) sets, ensuring that the distribution of positive and negative samples is preserved using stratification.

To estimate the probability that a cross leads to a shot, machine learning models are applied. A Logistic Regression model is trained to predict a probability score for each cross, making it an interpretable baseline model.

In the context of xCross, the goal of the model is to predict whether a cross will lead to a shot attempt (leads_to_shot = 1) or not (leads_to_shot = 0).

Additionally, a Random Forest Classifier is trained to capture non-linear relationships between crossing characteristics and shot generation likelihood. Both models are evaluated using accuracy, ROC AUC (Receiver Operating Characteristic — Area Under Curve), and classification reports, ensuring their ability to distinguish between successful and unsuccessful crosses in terms of shot creation.

Analysis

Now we have an excel file with the results for every cross in our dataset, containing the probability of it leading to a shot in the first three actions. Now we can start analysing the data.

First we can look at the player who have the highest xCross number in the 2024–2025 season so far.

As you can see in the bargraph above, these are the top 15 players who are most likely to give a cross that leads to a shot. When we look at Stassin, for every cross he takes, 82% of them will lead to a shot in the next 3 actions.

In the scatterplot below you can see the total number of crosses with the crosses leading to shots in the next 3 actions.

What I want to try to do is to find the correlation between shots from crosses and the probability of shots coming from crosses. That’s what we can see in the correlation matrix.

As you can see the correlation is very high with 0,99 correlation to 1 shot from xCross. There is a positive relation and that’s something we need to think on.

Final thoughts

Looking ahead, further improvements could include incorporating player movement data, defensive positioning, and match context to refine shot prediction accuracy. Testing more advanced models, such as XGBoost or deep learning, could help capture complex interactions between crossing characteristics and shot outcomes. Additionally, fine-tuning the Random Forest hyperparameters could further optimise performance. Ultimately, these refinements can provide deeper tactical insights.

P-values in football: assessing the validity of correlation in scatterplots

Scatterplots. Football analysts in the public space absolutely love them and I’m one of them. I have been using them for years to show two metrics in one graph and show any form of correlation. In my opinion, they were always insightful and meaningful, however, the feedback often has been: why put these metrics against each other? I wanted to see if I could use calculations to check it.

In this article, I will use a mathematical concept called “P-values” to focus on the validity of the correlation and whether it has been a good measurement for the data.

Scatterplots

A scatter plot (or scatter diagram) is a type of graph used to represent the relationship between two continuous variables. It uses Cartesian coordinates, where one variable is plotted along the horizontal axis (x-axis) and the other variable along the vertical axis (y-axis). Each observation in the dataset is displayed as a single point (dot) on the graph.

By revealing relationships, trends, and outliers, scatter plots are indispensable tools in data analysis, making them one of the most widely used visualizations in statistics and research.

In the scatterplot above you can see two different metrics combined in a graph, which we often see. However, we always want to see what correlation does and means. How do these metrics relate to each other and therefor make it a good data visualisation graph.

Correlation

Correlation is a statistical measure expressing the degree to which two variables are related. It quantifies the strength and direction of the relationship between variables, typically represented using a single numerical value called the correlation coefficient. Correlation does not imply causation — it merely indicates that a relationship exists between variables.

So we look at positive, negative or no correlations. We can show that in a scatterplot with a regression line. The correlation exists — negative or positive — when the dots are close to the regression line. You can see this in the plot below. This is an example of a positive correlation.

P-value

We have explored correlation and regression to understand the relationship between two metrics. These methods provide valuable insights into the strength, direction, and predictive nature of the relationship. However, they do not address whether the metrics are conceptually or practically appropriate to compare in the first place. Determining if the comparison is valid requires examining the theoretical relevance of the metrics, their validity in measuring related phenomena, and whether initial patterns suggest a meaningful connection.

The next step is to assess the significance of the observed relationship using the p-value. The p-value tells us whether the relationship is statistically significant, helping us determine if it is likely due to chance. However, it does not evaluate whether the metrics were appropriate to compare — it only validates the significance of the comparison. Before relying on the p-value, it is crucial to ensure the comparison is grounded in logical, domain-specific reasoning and that the metrics are suitable for analysis.

A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true. The lower the p-value, the greater the statistical significance of the observed difference. A p-value of 0.05 or lower is generally considered statistically significant.

So our idea is to characterise or measure whether the two metrics you compare are statistically relevant. In the graph below you see two different examples.

In the left graph you see a p-value of 0,0 which is lower than 0,05 and is therefore considered statistically significant. These metrics have a relationship which we are more equipped to work with. When we look at the right graph, we see a difference in p-value. the p-value is 0,2 and that’s bigger than 0,05 and therefore we see that these metrics are not as statistically significant.

Final thoughts

Scatterplots and p-values are valuable tools for understanding relationships between variables, but they play different roles. A scatterplot gives you a clear visual of the data, showing trends, patterns, or outliers that might influence the relationship. It lets you see if the relationship looks linear or if something more complex is going on. The correlation coefficient adds to this by giving a single number that tells you how strong and in what direction the relationship is moving, but it’s the scatterplot that helps you interpret this number in context.

The p-value, on the other hand, helps answer a different question: is the relationship you’re seeing in the scatterplot real, or could it just be random chance? A small p-value (like p<0.05) means it’s unlikely that the relationship happened by accident, giving you confidence that the connection is statistically significant. However, the p-value doesn’t tell you how strong or meaningful the relationship is — it only confirms that it’s not random. By combining scatterplots, correlation, and p-values, you get a fuller picture of what’s happening in your data, balancing visual insights with statistical validation.