Are people satisfied with the current rating system?

clintolsen · Post by **clintolsen** » Sun Apr 08, 2018 12:54 am

Since I'm working on adding a tournament import / ratings processor feature into the ACF site, I thought now would be a good time to ask what people think of the current rating system to determine if any changes should be considered in the future.

Is the system perfect as is?
Could it be improved?

In my personal opinion, here are some of the main flaws with the current system:

Provisional games status lasts for only 20 games for new players.
When you play a player in the provisional status your rating is not affected, but theirs is. This 20 game limit is a problem because it takes most players many more than 20 games to reach their true ratings and the ACF has never prevented players from entering a division based on their rating. Looking at top players like Borghetti / Scarpetta and others who started playing after the rollout of the new rating system, it took anywhere from 75 to 100 games to establish something close to their true rating. As a result of the current limit, players near the top are seeing their ratings unfairly reduced as the new players very slowly get to their true rating level. This can clearly be seen in the following tournament (http://www.usacheckers.com/ratings/tour ... mentid=405) where 3 new master players who just passed the provisional status or did so during the tournament saw their rating increase by a combined 392 points, while the rest of the field saw their ratings reduced by a combined 209 points. Basically, the field is being punished for playing players whose ratings are far lower than they should be. I've seen similar issues in qualifier tournaments, where many of the players are nowhere near their true ratings.

There are a number of ways we can solve this problem:
1) Increase the provisional games # to something higher, such as 75
2) Increase the K-factor for players in the provisional status. The K-Factor is a part of the ELO rating formula and affects how much each win, draw or loss affects your rating. The higher the k-factor, the faster the player will get to their true rating. Currently the K-factor for new players is 32, so we could increase this to something like 64 while they are in the provisional status. Doing this would get these new players to their true ratings twice as quickly.
3) Use performance ratings to rate new players after their first tournament. This would be my preferred option, and I believe it is the way the ACF used to rate new players.

K-Factor rules are no longer relevant given the reduction in the rating ranges since implementing the new system
The current ratings formula uses the following rules to determine which k-factor to use when calculating ratings for a given player:
- Rating > 2400? Use K-factor of 16
- Rating > 2000? Use K-Factor of 24
- Else use K-Factor of 32

As you can see from the ratings list, there are currently no active players with a rating > 2400, so these rules are no longer relevant. I would propose that we switch to a new system which assigns different K-Factors based on the number of games played, similar to chess. I also think we should ignore any games played prior to the implementation of the latest rating system so that players with old (pre 2008) inflated ratings move more quickly to their new rating after their first few tournaments. Here's an idea of what this might look like:
- Games played < 20? Use K-Factor of 64 (unless we use the performance ratings to rate new players as mentioned above)
- Games played < 40? Use K-Factor of 32
- Games played < 80? Use K-Factor of 24
- Games played < 120? Use K-Factor of 16

One rating list for all styles of checkers
The ratings formula assumes that player A has the same chance of winning over player B regardless of the style of checkers being played. However, that's simply not true for two reasons. Some players are better at one style of checkers, and GAYP is measurably more draw heavy. In the master division of the last 4 GAYP nationals, there were 55.0% drawn rounds. In the last 4 3-Move Nationals, there were 34.8% drawn rounds. 11-Man ballot also has similar results to that of 3-Move with 33.3% drawn rounds. I doubt I'll get much support for this suggestion, but I think we should have separate rating lists for GAYP and one for the others. Chess has 3 rating lists, one for each of the 3 different time controls used for chess tournaments (classical, blitz, bullet).

What are your thoughts or ideas?

sergio · Post by **sergio** » Sun Apr 08, 2018 7:17 am

Hello Clint,
You did a good job, but I still have some doubts about my rating. I see that on the page 'Ratings' (http://www.usacheckers.com/ratings/index.php) my points are 2265 and my last tournament is 2015 Old English Club Open. But my last tournament, according to the list,is 2017 ACF 11-Man Ballot – Masters. My up-to-date points in the last tournament are 2303.
Are there any mistakes, by any chance?

Best regards
Sergio

clintolsen · Post by **clintolsen** » Sun Apr 08, 2018 11:57 am

Hi Sergio,

The problem here is that your personal rating page lists the tournaments you played in according to the order that they were played:
http://www.usacheckers.com/ratings/play ... erid=12658

However, ratings are processed in whatever order the tournament crosstables are received in. If the tournament organizers failed to submit the crosstables to the ratings processor for 2 years, then those older tournaments are going to be processed last. If you look at the Rated Tournaments page, which lists tournaments in the order that they were processed, you can see that the 2015 Irish Open Masters (2303->2271) and the 2015 Old English Club Open (2271->2265) that you played in were the most recent of your tournaments to be processed for ratings, which is why 2265 is your current rating.

I can add a tournament ID column to the tournaments listed on a player's rating page so that they are sorted in the order that they were processed, if that will make it less confusing.

By the way, if people want this issue to be fixed going forward, I could update the ratings scripts so that all future calculations are done using tournaments in the order that they were played in.

sergio · Post by **sergio** » Sun Apr 08, 2018 5:55 pm

Thank you for your explanation, Clint.

In my opinion, it's no use fixing that mistake because there are other tournaments to update.
However, it would be better to ignore them when the crosstables come too late.

Sergio

champion374 · Post by **champion374** » Sun Apr 08, 2018 9:14 pm

Clint Question has all the clubs around the world known your doing the ratings now and have your email or any contact info?

S_McCosker · Post by **S_McCosker** » Wed Apr 11, 2018 3:22 pm

First of all great job to everyone who sends in tournaments and to the rating processors keeping the ratings up to date.

One rating list for all styles of checkers
The ratings formula assumes that player A has the same chance of winning over player B regardless of the style of checkers being played. However, that's simply not true for two reasons. Some players are better at one style of checkers, and GAYP is measurably more draw heavy. In the master division of the last 4 GAYP nationals, there were 55.0% drawn rounds. In the last 4 3-Move Nationals, there were 34.8% drawn rounds. 11-Man ballot also has similar results to that of 3-Move with 33.3% drawn rounds. I doubt I'll get much support for this suggestion, but I think we should have separate rating lists for GAYP and one for the others. Chess has 3 rating lists, one for each of the 3 different time controls used for chess tournaments (classical, blitz, bullet).

I like the idea of a separate rating for each style since it gives a perspective on where you are with each part of your game.

I think maybe this is a future "nice to have" thing as i'm sure maintaining one rating list is difficult and time consuming as it is.

Regards
Shane.

champion374 · Post by **champion374** » Wed Apr 11, 2018 6:01 pm

I like the idea Clint.But as Shane said would take up alot of time so up to you bro.

nboatman · Post by **nboatman** » Thu Apr 12, 2018 1:41 pm

Clint, good work with the new ratings page!

I agree with some of the concerns you’ve raised too; thanks for getting the discussion started. As far as different styles of play having different distributions of outcomes, I think the simplest solution is to use different functions for expected outcome, rather than maintaining separate rating lists. Separate rating lists would be the way to go to capture the fact that a person may well have different skill levels in different styles of checkers, but it’s unnecessary if we’re just wanting to capture different outcome expectations. I think this came up in an email thread about ratings a few years back. I don’t know if you were included. When I get home, I’ll try to track that down. I think I may have suggested some different functions for expected outcome for different styles (GAYP, 3-move, 11-man) and scoring methods (round vs ballot).

As far as K-factors go, I think we should look a little more closely at this. Increasing K-factors would get new players into the neighborhood of their true rating faster, but it’s also going to cause a lot of overcorrecting. I’m not even sure the k-factors we started with were appropriate; I think we just used the same ones as chess.

For performance ratings in individual events, there seems to be a problem. Looking at the 2016 11-man WTM, Alex won, but Jim’s performance rating for that event was higher. Alex performed better in the event, so his performace rating for the event should be higher. To be clear, I’m not saying that Alex’s rating should have gone up as a result of the match (after all, he went into the match with a higher rating). While there may be good reasons for performance rating not to line up with end-of-tournament ranking in events with more than two players (the same score can mean very different things depending on whom you played), I think it’s clear cut in two-person matches.

What are the rules for titles? Do these rules account for the rating deflation that’s occurred?

One of the fundamental questions we need to address as a groups is “What should ratings mean?” For instance what absolute levels do we want ratings to have (in reality, this is arbitrary)? Do we want 2000 (or pick some other arbitrary number of your choice) to correspond to roughly master level? Do we want ratings to mean roughly the same thing across time? Do we want titles awarded based on performance at different points in time to mean the same thing? What do we want rating differences to mean? Let’s restrict to game-scored 3-move, for simplicity. Should a 200-point difference correspond to an expected score of 3-1? Or maybe 2.5-1.5 or 3.5-0.5? Under the current system, I think 200 points implies a pretty large difference in level of play... probably more than we want.

champion374 · Post by **champion374** » Fri Apr 13, 2018 7:50 am

Im looking forward for the 2017 Barbados open to be rated as i did very well in that tournament.

bazkitcase5 · Post by **bazkitcase5** » Fri May 11, 2018 11:29 am

Sorry I am late to this discussion. Been pretty busy and guilty of not checking the forums enough. I was trying to get some of these issues corrected a little over a year ago via e-mail with those who better understand the math behind it such as Mr. Boatman and Mr. Clark. So I am very glad you brought up some of the issues public, because they really do need to get fixed, especially if we are going to be awarding Expert, Master and Grandmaster titles in large part due to rating.

I still have the e-mails where we were discussing these issues, however I think you are already on the right track. I know one of the main things we agreed with in the e-mails is a more proper "range" Such as Expert being more like 2000-2200, Master 2200-2400 and Grandmaster being 2400+ instead of the very tight and narrow range we have now. Although as you said, this could very well be more because of the propagator issues and if that were fixed the range may eventually adjust itself.

Ideally, if we could find a way to fix the rating system with a range that everybody likes and that makes sense, then maybe we could be lucky enough to have a dedicated volunteer completely rerun the numbers from the past X number of years (maybe as far back as the original rating system) so that it won't take another 10-20 years for the new ratings to adjust to their proper levels.

clintolsen · Post by **clintolsen** » Fri May 11, 2018 2:22 pm

Regenerating the ratings is definitely something that could be done - it would probably require a membership vote though, wouldn't it?

I built something a while ago in order to verify that my ratings calculator was accurate and also to simulate what the ratings would look like with various tweaks to:
- ELO formula rating range (default is 400)
- Different k-factors
- # of games a player is considered provisional status

I'd need to fix that now that I've made some changes to the ratings database but it wouldn't be too much work. I could do this after the import new tournament process is finished and running.

In my opinion, in order to maintain ratings somewhere in the range we used to have in the 'old days' we would need to increase the '400' used in the ELO formula (something like 600-800) and do one of the following to handle the new player effect that is continuously bringing the top ratings down (and vice-versa):
- Use performance ratings to assign ratings to new players after their first tournament - this gets players close to where they should be extremely quickly, although it's not perfect
- Increase the # of games that a player is considered provisional, so that they do not adversely impact the ratings of their opponents as they slowly prod their way to their true rating level

If we are OK with the new rating ranges, then we need to do something to get players who are far away from the true rating closer to their true rating. This could be done by having ELO formula k-factors that start quite high and decrease as players play more games under the new system.

Probably this should be a topic for dicsussion at the US nationals where a decision can be made on what to do.

bazkitcase5 · Post by **bazkitcase5** » Mon May 14, 2018 7:11 pm

I agree, a conversation a Nationals would be a great way to cover a lot of ground. However, I think you are already on the right track.

I do think having a provisional formula that gets new players to their proper rating as quickly as possible is better (I think chess does 20 games or they use to), because there just aren't as many checker tournaments that most new players can attend. Soo other than a tournament like Nationals with a lot of games, it could take months or even a year for a player to reach even a 20 game threshold and that is a lot of time for player who is improving somewhat rapidly.

nboatman · Post by **nboatman** » Thu Jun 07, 2018 10:01 pm

bazkitcase5 wrote:I know one of the main things we agreed with in the e-mails is a more proper "range" Such as Expert being more like 2000-2200, Master 2200-2400 and Grandmaster being 2400+ instead of the very tight and narrow range we have now.

Thanks for sharing this, Clayton. The next thing to decide is what we think rating differences should mean in terms of outcomes.

For instance, if two players are separated by 200 points, what score would we expect if they played many rounds against each other. Let's say they're playing 3-move rounds that are scored by game, since that's the format of most tournaments. Based on the ranges Clayton mentioned, we envision 200 points as roughly the difference between a minimal grandmaster and a minimal master (or between a minimal master and a minimal expert).

What kind of score would people expect here? For instance, would you expect a minimal grandmaster to average 3 points a round against a minimal master? Maybe that's too high, and people think the number would be more like 2.5? I suspect all of us can think of players that we'd identify as being "barely a grandmaster", "barely a master", or "barely an expert", so I think many of us have opinions about how contests between such players should turn out. I'd like to hear lots of people's opinions for what scores we should expect (on average).

nboatman · Post by **nboatman** » Thu Jun 07, 2018 10:55 pm

Based on the current formula for expected outcome:

The expected score for two players with the same rating is 2-2, as expected.

The expected score for players that are separated by 200 rating points is about 3.04-0.96, or roughly better than a win and a draw for the higher rated player.

The expected score for player that are separated by 400 rating points is about 3.64-0.36. This is roughly equivalent to 18 points over 5 rounds... in other words 8 wins, 0 losses, and 2 draws (or 9 wins, 1 loss) for the higher rated player.

I think the latter two expected outcomes (for 200 and 400 rating point differences) are too lopsided. In a contest between a minimal grandmaster and a minimal master, for instance, I would expect the contest to be more even than this. I think something more like 2.5-1.5 is closer to the mark, but maybe still a little too lopsided.

For the 400 rating point difference (e.g. a minimal grandmaster vs. a minimal expert), I think 3-1 may be reasonable.

What does everyone else think?

chipschap · Post by **chipschap** » Fri Jun 08, 2018 9:55 pm

Just sort of thinking out loud here ... but I'm pretty familiar with Elo and variants such as Glicko.

Comparing chess ratings with checker ratings, I do wonder if expected score is influenced by the high percentage of draws (at least at upper levels). I know how the math works, and the numbers given above are what the formulas tell us. Would a GM really beat a master 9 out of 10? Would it be more likely 3 wins and 7 draws (made up numbers just to illustrate the point) for a score of 13, not 18?

Elo/Glicko assumes a continuum of results. This is of course not true for chess, checkers, or any game with discrete outcomes like win-lose-draw. But it works to a reasonable degree. As the percentage of draws goes up, how does it affect things (or not)? It seems we go from a rough continuum (over a large enough series of games) to something much more stepwise. Or does that just mean that the number of games to yield reasonable results is higher?

The American Checker Federation

Are people satisfied with the current rating system?

Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?

Re: Are people satisfied with the current rating system?