On Thursday 28th March I appeared on the A Celtic State of Mind Podcast with Paul John Dykes to lay out the data behind refereeing decisions over the last few seasons insofar as it impacts the top of the SPFL Premiership. Also, the data presented is detailed and written up in a series of articles on this blog.
The podcast and exercise are about holding the Scottish Football Association (SFA) to account. Because the Fourth Estate refuses to do it despite the availability of the public data and the clear public interest. They won’t report for “commercial considerations”. No, me neither. We saw the same pattern when Rangers went bust, so it isn’t a surprise.
Despite the obvious target, and the conclusions leading to a call for refereeing reform within the SFA, the supporters of one club, who every Saturday sing “F*** the SFA” when they don’t get a throw-in, are agitated about this material. Whilst this chorus does give everyone a break from what Alistair McCoist calls “Hate Crimes”, and it is rather sweet some folk will stick up for the beleaguered SFA and their referees, it is all very perplexing.
I am always happy to debate and discuss methods, data and conclusions providing folk come at it in good faith. On the X platform, there have been a couple of The Rangers supporters who I am willing to accept at face value have some concerns to raise. Debating on that platform is reduced to sewerage after about two tweets, so I will reply here.
First up we have Xander Flatt. He is a fan of The Rangers, here is his bio.
Xander’s comments are in bold italics, and my responses are in the house font.
So, as someone who works in analysis and has an interest in football,I was sent me this. It speaks to wider issues in Scottish football and highlights a lot of the reasons our game struggles to move forward.
A short thread
Firstly, a couple of caveats.
The author has had incidents reviewed by an independent referee. I believe this was done in good faith. HOWEVER, many of the calls are pretty subjective so it's a little dishonest to portray one ref's opinion as objective fact.
I agree most calls other than offside under VAR carry a degree of subjectivity.
Using the same ref, especially a non-Scottish one, ensures objectivity. I am confident in that assertion. Why? The proof of that is in the pudding. Read the decisions he makes on the blog – give me examples where any bias is displayed in the context of Scottish football? Also, read the blog comments – he manages to infuriate fans of all sides.
I believe he DOES bring his own unconscious bias to this. I would characterise his unconscious bias as he will do his best to support the referee's decision and needs a lot of persuasion to change a decision. That seems human to me. It also serves to downplay the impact of poor calls benefitting The Rangers given ALL the other trends. Them’s the breaks.
However, the major advantage of using the same referee is consistency. He tends to see handballs for example, in a consistent way, unlike a random selection of e.g. 12 other referees.
Secondly, I believe the author has put a lot of effort into this but has made fundamental errors at the collection and analysis stages. The conclusions may have come from an honest place but they are as flawed as a result of a flawed process.
This will be exciting then! (Spoiler alert: it isn’t).
Lastly, football is a game of random events. Some events become more probable but it's all random.
Randomness doesn't happen evenly. The image is areas bombed during WW2, it's random but people assumed deliberate intent because people like patterns. We like things to make sense
Great point. If I can replay that to you in the context of refereeing – randomness (for example, as one would get with incompetence) has no pattern. Spot on. That is my starting point. My hypothesis if you will is that refereeing inconsistencies are random and affect every team broadly equally over a large sample.
As I say in the pod and writing, this is the expected outcome – that there should be no statistically significant impact of refereeing decisions whether it impacts Celtic or The Rangers. Why would heavily dominant Celtic and The Rangers see wildly different outcomes if refereeing errors were random? You wouldn’t, so if those are not the patterns (they are not) something else must be at play. What is it – I don’t know. We should ask some experts.
1. The data presented to the independent ref is hugely selective. Added to the first caveat, this sets us off on the wrong foot.
2. Some points in this Ran Vs Cel analysis use H2H data and others don't. This appears to be based on what fits the author's assumption
1. The data presented to the Yorkshire Whistler, which is one section of the work not all of it, is dependent on big calls – that is sendings off, penalties and goals allowed/disallowed, being reported and video replays available either on the SPFL YouTube channel or the BBC Sportscene highlights. A fallback is what “incidents” are trending on X after the game. Believe me, neither support is slow to make a loud song and dance about any grievance for either side.
So not “hugely selective” but I am sure, occasionally, video highlights have been difficult to source in which case I have nothing for the referee to review. I can assure you that it has fallen both ways and doesn’t happen often, there is not “hugely” an issue.
Secondly, the rest of the analysis uses the same public data for Celtic and The Rangers over multiple years and therefore that blanket statement is “hugely” incorrect.
2. You didn’t ask about dates but I’ll address that anyway. ALL the analysis covers the 2020-21 season to this season. The reason for that is those are the seasons The Rangers became competitive AND those were the seasons the end-of-campaign prizes became much more highly charged. Specifically, 2020-21 was about either winning or losing 10 in a row. And the subsequent seasons they carried guaranteed Champions League money at the end of it. Huge money in the context of Scottish football.
ALL the analysis covers those periods with the following caveats:
The Yorkshire Whistler was only engaged in the 2021-22 season and I wasn’t going to go back and ask him to retro-review a year's worth of data.
The penalty count data is not my data – it was provided by @ScotlandsCoefficient to show penalty trends this century.
The analysis of penalty differentials across the SPFL was not my data (@JBLuvsCeltic) and was reproduced as provided from 2018-19 to 2023-24.
The Impact of Red Cards and Penalties differential between Celtic and The Rangers went back to 2016-17 when The Rangers were promoted for the first time because it was necessary to show that before 2020-21 there was no discernible pattern in impacts. Something changed in 2020-21, and this is used to emphasize that point.
The top two are hugely (that word again) dominant over the rest of the league. Therefore, comparing them together is an issue how? The hypothesis is that there should be no difference in how the top two are treated as regards refereeing,
I am comparing two teams with relatively few points between them against teams with a huge points drop to the third-placed team. In general, I’ve used the league for trends and the top two for impacts because it is those two who are fighting for the title and for whom the points matter more in this context.
Are you suggesting it is inappropriate to compare the two teams or that we should observe anything but similar outcomes?
The key question is “are both clubs treated the same as regards the models” and the answer to that is yes, every time.
3. The word "pattern" assumes non-randomness. Starting with the assumption that a data pattern means deliberate intent is flawed. This is an honest, easily made mistake. But a mistake nonetheless.
That is quite the semantic leap. We’ve agreed randomness has no pattern and that is what we expect to see. You then leap to claim the existence of a pattern means “deliberate intent”.
Where do I claim that? Where do I get into “Why” there are patterns?
My objective here is to establish “does Scottish football have a problem with refereeing?”. I believe the answer is yes given the statistical significance of ALL the trends and the same team being the outlier.
Why is that? We could both speculate but I’d suggest that sociologists, behavioural scientists, and historians are best placed to answer.
4. xPts dropped do not translate into actual points dropped. Assuming 1.7 pts are dropped for an opposition pen might be fair, but if you're 5-0 up at that point, it won't affect the outcome. The author shows potential impact but implies actual impact.
You are quite close, but no really – it is an estimate based on a model that considers the time in the match and the game state (i.e. the score in terms of whether the teams are drawing or one team or the other is +1. +2, +3 goals ahead etc).
What you then go on to state is exactly how the model calculates the xPts. I have never said anything other than it is an estimate.
Nowhere do I use a value of 1.7 xPts for a penalty in any case. That is a fundamental misunderstanding of the xPts model which is clearly explained in the relevant blog posts.
1.7 would be the maximum estimated xPts should the scores be level and a team scores a goal (not a penalty) in the 90th minute. If that effort were a penalty, you would adjust the xPts accordingly by multiplying by 0.77 which is the xG value of a penalty. Therefore, the maximum xPts for a last-minute penalty when the scores are level would be 1.31 xPts.
Since the start of the 2020-21 season, there have been six penalties that fit that profile. Two involved Celtic or The Rangers (i.e. one each). So, not only is this nit-picking in the extreme, but sadly you’re failing to understand the xPts model.
5. The figure of 1.7 xPts comes from data from a different league and assumes all penalties are equal, however in real life, some are more equal than others.
Again, you are fixating on a value that cannot happen for a penalty, and the scenario has happened two times in the period under review. AND you don’t understand it!
You are spot on to say “in real life, some (penalties) are more equal than others”. That is EXACTLY the point of looking at the time and game state when penalties are awarded and calculating the xPts accordingly.
And the results are penalties for The Rangers are hugely more impactful. Surely it should be random?
So, I think you agree with me (hurrah) but I’m not convinced you understand why (boo).
6. Red cards. Again, this assumes all have the same value and the same impact. It also assumes that red cards have a lesser impact than penalties. That depends on when in a game they occur and what the score was at the time
You have described precisely the analysis I have done and presented it here as if it is the analysis I SHOULD have done but did not! Mind-bending.
By the way, red cards DO have less impact ON AVERAGE (i.e. not always) than penalties depending on the game state and time.
I worked out the time and score at every red card event then calculated the impact that had on the final score, and then modelled the result.
Again, I am so glad I have done what you would recommend and so sorry you failed to understand that.
7. A universal score for all pens and red cards assumes all penalties are scored and all red cards result in a different result. This is obviously not true
There is an xPts value for each penalty and red card. The penalty xPts used a goal xPts model and then adjusted each xPts by 0.77 as this is the probability of the penalty being scored. So, it does not assume all penalties are scored – quite the opposite.
The Red Card xPts model was built using real SPFL data as described above. It is the AVERAGE of all the REAL impacts across REAL SPFL games – that is how models work.
So, there is no “universal” score for all pens and red cards. The xPts value, which is an estimate for both clubs but the same model (with the xG adjustment for penalties), is calculated based on the score and the game state.
Again, I am so glad I have done what you would recommend and so sorry you failed to understand that.
8. There is no definitive way to show that a red card changes a result because there's no definitive result to start with. It's a hypothetical comparison. Giving it a numerical value doesn't hold up.
I analysed the actual scores when the 178 red cards in the SPFL from the start of the 2020-21 season were awarded and compared it to the final score and summed the impacts by time segment to get the AVERAGE impact across the 90 minutes.
Turned out it didn’t matter whether you were home or away.
I logged whether draws became wins, wins became draws, wins became defeats etc. That gives you an average points difference between the points at the time of the card and the final allocation of points in the match. That then becomes xPts in the red card model.
It is a model, therefore it is an estimate based on the average impacts that happened in this league in real life.
9. There are other issues but I don't need to labour the point. Essentially, this is what happens when you decide what you believe to be true and set out to prove it rather than objectively using data to identify if a problem exists.
I'm sorry, I'm going to have to be a bit sharp now. You have shown you failed to understand the analysis undertaken. You can’t even be bothered to list all the “other issues”.
Therefore, save me the lecture on MY poor methodology and your accusation of bad faith when you can’t even be bothered to understand the data and method presented. Not really on, is it, Zander?
So why does this speak to wider issues holding back Scottish football?
Because everyone is so insular, too many people have blinkers on. Standards will not improve while every team believes that anyone not for them must be actively against.
Each data set has been tested for the statistical significance of the result using the Z-Score and there are clear patterns of assistance for one team.
Standards will not improve until Scottish football clubs recognise this, and the fans force the clubs to force the SFA to modernise.
As a fan of The Rangers, I accept this is too painful for you to believe.
Look at referee announcements: often both teams will see them and both be claiming "he hates us, he won't give us anything".
True. Hence the need for public data, subject to transparent analysis to show that this thinking is incorrect and that there is an issue, just not an “Old Firm” or “Big Club” issue as per the lazy tropes.
While the data shows no evidence of incorrect decisions benefiting one team over another, let alone intent, it wouldn't surprise me that if the SPFL and SFA set themselves the task of ensuring the dominance of one team, they'd be so useless at it, it would have no impact
No one was trying to prove or otherwise “intent” – that is you assigning something there is no evidence of in my posts nor on the pod. That’s really poor of you.
Each data set has been tested for the statistical significance of the result using the Z-Score and there are clear patterns of assistance for one team. It is the same team that is the significant outlier across all the analyses.
You have not queried or debated the actual data or results other than to deny the existence of the statistically significant results which are presented. All the data is public, and these results could be reproduced if you so wish.
We agree the governance bodies are useless so that is common ground.
So, there we have it. The big rebuttal. Make up your own mind, dear reader.
Genuine thanks to Xander. I am sure all the thoughts are his and his alone and he is not being played in the background by bad-faith actors. Anyone suggesting such is also acting in bad faith in my book. Harrumph.