The calibration engine is at the heart of SquashLevels, pounding away at all the new results every night generating levels for every player for every match across the system.
The following sections attempt to divide a complex engine into its constituent parts and give you an insight into how it all works. All the details are left out for obvious reasons but, if you read this FAQ, you will have a pretty good idea what goes on under the calibration hood. Anoraks on...
This is the most obvious part of the engine with the system assessing the level that the player is playing at and assigning a level value to them. For every match the system compares the actual result with the expected result against their opponent and they go up a bit if they play better than expected and down a bit if less well than expected. All ranking systems do this though SquashLevels makes a point of using points scores for accuracy. The algorithm itself is based on:
- Maths - PAR is easy (11-5 is about twice as good), English scoring less so. We use a combination of points scores and games scores to assess the result. The overall goal is that if you are twice as good as your opponent then your level will be double theirs. This works all the up from beginner (<50) to top pro (>50,000)
- Weighting - the more important the match (e.g. a tournament) the greater the weighting. This allows you to play a box match without having too much impact on your league standings. See the FAQ on match type weightings below.
- Behavioural modelling - as it turns out, not everyone puts 100% effort in every match and that’s down to behaviour. There are many other cases too where player behaviour defies the maths and, based on the analysis of 1.6 million results on the system, we’ve built an extensive behavioural model that allows us to predict and make use of these behaviours. This is a critical part of the engine because it has a very significant effect on player levels. It also results in uneven level changes where one player can go up more than their opponent goes down and that causes ‘drift’! For more on our behavioural modelling and how we combat drift, see the relevant FAQ sections below.
We can work with game scores only, making assumptions around the average 3-0 result (based on our analysis of real 3-0 match results) but we can only use averages so it takes a lot more results for the levels to become accurate. Not all 3-0 results are the same, obviously.
Applying player calibration to a set of players who play each other over a period of time naturally calibrates all those players over time. The more they play and the more opponents they play, the quicker the calibration. They are effectively a pool of players such as those from a club or a county league.
This is a natural effect and doesn’t require anything specific from the calibration engine. All calibration engines therefore provide intra-pool calibration.
A player’s level doesn’t mean much unless you can compare them with other players on the system. I.e. A 1000 level player in Surrey should be playing at the same level as a 1000 level player in Yorkshire, or Calgary for that matter.
This is ‘pool calibration’ where players in a pool are treated ‘as one’ and then compared with other pools such that their respective pool levels are equivalent. The comparisons are made by analysing the results of those players who play in more than one pool but, as ever, you have to be careful which results you use and how you use them. Behavioural modelling is really important for this.
There are different types of pools and they behave slightly differently such as geographical pools like Yorkshire and Surrey and then there are club boxes, tournaments, ladders, tours and so on. Club boxes are interesting as the top players are usually more associated with their county pools and the actual pool boundary is further down the boxes. Tricky!
Just to add to the challenge, some pools are a subset of other pools (e.g. a tournament series in a club) and others are made up of subsets from a number of other pools such as regional events and leagues. We refer to these as derivative pools. They might appear to need calibrating but they can’t be adjusted!
As long as there is a group of players who play each other at least a few times over the course of a season then they can be considered a pool. There’s a good deal of complexity around automatically identifying the pools and where the pool boundaries are.
Another goal is that a player’s level is also equivalent over time. I.e. If your playing level is twice now what it was two years ago then you should be able to see that in the charts and also make predictions and set goals for yourself in the future. It’s also good fun to compare ‘best levels’ from different players at different times and see who might have come out on top had they played each other in their prime.
Drift is the big problem for time calibration and, as mentioned before, the use of behavioural modelling causes drift. There are some interesting factors that effect and cause drift, all of which have to be taken into account over hundreds of thousands of results over, literally, decades. If we’re out by even a small amount, the drift really adds up and you just can’t compare more than a few years.
These are some of the most common causes of drift:
- Those players who start low, end high and then stop playing. That’s level lost!
- All the irregularities from behavioural modelling
- Gaps in player history - especially juniors who get (a lot) better without apparently playing (missing results) and masters… who don’t…
- Pros who suddenly stop at stratospheric levels and then reappear 5 years later in the leagues at a much lower level.
- We all get better the more we play - but by how much? It’s hard to take this into account in a fundamentally comparative algorithm but if we don’t the whole system actually drifts down over time.
It would be nice if it all cancelled out but it doesn’t. The analysis we have done has allowed us to quantify all of these drift effects and we’ve been able to factor them into the algorithm. The results now look pretty good - even over a 20 year period.
The whole system is about player levels but one of its goals is to be fully inclusive so that beginners can compare with club players who can compare with league players and so on all the way up to the top pros. It’s great fun to run the predictor between ourselves, keen amateurs, and the top pros or anyone else on the system and actually get a predicted result.
Clearly, the pros don’t play the beginners so we rely on the results of beginners playing leisure players, leisure players playing box players etc., all the way up to the top pros. There are actually 8 distinct levels that we need to calibrate across (levels approximate!):
- Beginners - (< 100)
- Leisure players (50 - 300)
- Club boxes players (200 - 2000)
- County league players (500 - 3000)
- Top county league players (3000 - 10,000)
- PSL (10,000 - 30,000)
- Satellite PSA (20,000 - 40,000)
- Top PSA (30,000+)
This doesn’t leave much margin for error as, with a comparative based system, any errors will be exaggerated at the ends.