At what level of player would xg =goals? I’m still trying to understand these data models 🥹 Am I right I’m saying a human analyst is looking at every game and makes their xg % guess on a shot taken based on their own interpretation of ‘the probability of an average player’? So is any data between 2 x games comparable?
My question is more about the data collection- is it a human review of a shot, and then based on where the ball was when the shot was taken etc, a number/goal probability is manually attached to that action by the reviewer? As Alan does for each Celtic game?
As I understand it, the model produces a formula or look-up table that assigns a value of xG to a shot based on a set of parameters that will typically include range and angle to goal, number of defenders between the shooter and the goal and whether the shot is with the foot or the head. More sophisticated models will include more parameters. The analyst observes the shot and determines the parameters, and the model produces the xG.
Well put, Stephen and thank you. Different providers develop their specific models. Statsbombs is, as far as I am aware, the most sophisticated accounting for speed and height of cross and shot, for example. Originally, in the early days, specific models were developed for specific leagues. Not sure that is still the case, eg with Opta. As we keep saying, this is about accuracy, not precision. And single game or shot xG is interesting, but not directional.
Thanks Stephen, this makes sense. I guess the statisticians are happy with the aggregate data but my mind still strays to comparative data and maybe whether there is a league or team benchmark that comes close to xg = goals over a season, or an appropriate number of games. Eg Might we expect goals to exceed xg in the Champions league but lag xg in Scotland if the xg models include empirical data from many levels of football? I feel myself getting drawn into understand the data better - my wife keeps asking ‘what are you going to do when you retire’ …
Great post and really good to see contributions from the community on the site. If I find anything useful to say I will try and contribute in coming months.
Thanks a lot Stephen. Your focus on G/xG is very interesting. Any data-focused content I have consumed (including HB) usually suggests out- or under-performing xG is basically luck, unless you're Messi or Haaland. Do you think there is more to consider in terms of quality of shot, rather than just quality of chance? Sounds like you do, but interested in thoughts.
I came at this from two directions, Liam. The first was the pretty strong feeling that, in the clearest circumstance, a one-on-one between the striker and the goalie, there are some strikers who you’d always trust to score, and some that have you clenching everything until you see the outcome. The other viewpoint was simply that if a mathematical model describes the average behaviour of a population, then it won’t fit so well for cases that are far from that mean. Footballers come in a very wide range of capabilities, so I expected to see deviations from the model behaviour for players and teams that are notably above or below average.
G/xG seems to me a very natural way to assess this as it eliminates confusers such as volume of chances (eg a player scoring +2 over an xG of 10 is doing better than one scoring +2 over an xG of 30), and is relatively easy to interpret as being, say, 12% better than average or 30% worse.
Since this sort of data has become available, I’ve kept an eye on G/xG. You mentioned Messi, and he was indeed the first player I ever looked at. It’ll come as no surprise that he outguns xG every time. Of course he does; his balance and technique are superhuman. Messi is exceptional in every way, but think about all the goals James Forrest has scored from just inside the penalty area and then consider all the fullbacks who have shot from the same place and fairly reliably put the ball into the stand. Finishing is a critical skill and it’s real. Players like Harry Kane and Robert Lewandowski tend to have G/xG>1 and they still attract a huge premium well into their thirties. It’s not the only thing about being a striker, but it’s certainly important.
If you look at G/xG for teams in leagues with good data collection (eg from the Big 5 leagues on Opta), the ones at the top tend to have G/xG>1 and the ones at the bottom usually have G/xG<1. Of course, there are exceptions, and that’s interesting too.
So, xG is noisy and you have to be careful with it, but I think there’s plenty of evidence that we can relate G/xG to performance and use it as a differentiating metric.
At what level of player would xg =goals? I’m still trying to understand these data models 🥹 Am I right I’m saying a human analyst is looking at every game and makes their xg % guess on a shot taken based on their own interpretation of ‘the probability of an average player’? So is any data between 2 x games comparable?
I believe that most xG models are based on large numbers of observations, so the models are empirical rather than theoretical.
My question is more about the data collection- is it a human review of a shot, and then based on where the ball was when the shot was taken etc, a number/goal probability is manually attached to that action by the reviewer? As Alan does for each Celtic game?
As I understand it, the model produces a formula or look-up table that assigns a value of xG to a shot based on a set of parameters that will typically include range and angle to goal, number of defenders between the shooter and the goal and whether the shot is with the foot or the head. More sophisticated models will include more parameters. The analyst observes the shot and determines the parameters, and the model produces the xG.
Well put, Stephen and thank you. Different providers develop their specific models. Statsbombs is, as far as I am aware, the most sophisticated accounting for speed and height of cross and shot, for example. Originally, in the early days, specific models were developed for specific leagues. Not sure that is still the case, eg with Opta. As we keep saying, this is about accuracy, not precision. And single game or shot xG is interesting, but not directional.
Thanks Stephen, this makes sense. I guess the statisticians are happy with the aggregate data but my mind still strays to comparative data and maybe whether there is a league or team benchmark that comes close to xg = goals over a season, or an appropriate number of games. Eg Might we expect goals to exceed xg in the Champions league but lag xg in Scotland if the xg models include empirical data from many levels of football? I feel myself getting drawn into understand the data better - my wife keeps asking ‘what are you going to do when you retire’ …
Calibration of xG models is an issue. I don’t think you can use the same model universally. That’s why I only discussed SPFL matches in the article.
Great post and really good to see contributions from the community on the site. If I find anything useful to say I will try and contribute in coming months.
The that will be most welcome Michael
Thanks a lot Stephen. Your focus on G/xG is very interesting. Any data-focused content I have consumed (including HB) usually suggests out- or under-performing xG is basically luck, unless you're Messi or Haaland. Do you think there is more to consider in terms of quality of shot, rather than just quality of chance? Sounds like you do, but interested in thoughts.
Cheers
I came at this from two directions, Liam. The first was the pretty strong feeling that, in the clearest circumstance, a one-on-one between the striker and the goalie, there are some strikers who you’d always trust to score, and some that have you clenching everything until you see the outcome. The other viewpoint was simply that if a mathematical model describes the average behaviour of a population, then it won’t fit so well for cases that are far from that mean. Footballers come in a very wide range of capabilities, so I expected to see deviations from the model behaviour for players and teams that are notably above or below average.
G/xG seems to me a very natural way to assess this as it eliminates confusers such as volume of chances (eg a player scoring +2 over an xG of 10 is doing better than one scoring +2 over an xG of 30), and is relatively easy to interpret as being, say, 12% better than average or 30% worse.
Since this sort of data has become available, I’ve kept an eye on G/xG. You mentioned Messi, and he was indeed the first player I ever looked at. It’ll come as no surprise that he outguns xG every time. Of course he does; his balance and technique are superhuman. Messi is exceptional in every way, but think about all the goals James Forrest has scored from just inside the penalty area and then consider all the fullbacks who have shot from the same place and fairly reliably put the ball into the stand. Finishing is a critical skill and it’s real. Players like Harry Kane and Robert Lewandowski tend to have G/xG>1 and they still attract a huge premium well into their thirties. It’s not the only thing about being a striker, but it’s certainly important.
If you look at G/xG for teams in leagues with good data collection (eg from the Big 5 leagues on Opta), the ones at the top tend to have G/xG>1 and the ones at the bottom usually have G/xG<1. Of course, there are exceptions, and that’s interesting too.
So, xG is noisy and you have to be careful with it, but I think there’s plenty of evidence that we can relate G/xG to performance and use it as a differentiating metric.
Thanks a lot Stephen 😊
Would love Alan and James's view on this, as I'm not sure quality of finishing is ever something that has been given much credence.
We'll take a look on the pod recorded this evening
🐄 🪕😄