I know something that can correct an error in the data. |
Please contact us at [mtgeloproject AT gmail DOT com]. Especially if you have a piece of information like:
Note that because the database is so big (over 2,500,000 matches), it is cumbersome to refresh the ratings. Since every future match could be affected by changes in the past, we need to rerun everything to fix errors. For that reason we'll be consolidating errors that are brought to our attention and fixing them as part of a weekly refresh. |
Where did the data come from? |
The results pages from the event coverage of all of the events, like this one. We did our best to combine entries we reasonably thought were the same person, but we only have so much to go on. We were pretty timid about combining anyone with a very common name like Johnson, Miller, Williams, etc., so we need some help from the community to improve those entries. We also made some guesses based on geography about whether some people were the same. For instance, if a name played in Melbourne, Sydney, Toronto 2015, and Toronto 2016, we split the Australian entries from the Canadian ones. Unfortunately, not all GPs include the nationality of the participants, so we couldn't use that as a method to distinguish people. |
What if I don't want my name on this site? |
No problem! We feel okay about gathering this data in one place in general, since we've used nothing that isn't already publicly available, but if you'd rather not be searchable we totally understand. Just send us an email at [mtgeloproject AT gmail DOT com] and we'll happily change you to "anonymous" in the system. |
Which tournaments are in the system right now? |
Currently we have data from just about every Pro Tour, World Championship, and individual Grand Prix from Pro Tour Los Angeles 1998 through Regional Championships Toronto 2023. See the next question for information on which early tournaments we're missing. After initially intending the project to be limited to Grand Prix, we decided to include these other tournaments in the system for the following reasons:
|
Which tournaments aren't in the system right now? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Here's a timeline of the first few years of professional Magic. The colored tournaments are missing.
In addition, the following individual rounds are missing:
|
How do the ratings work, in three sentences? |
Each person enters the system with a rating of 1500. After each match, the winning player takes points from the losing player. The amount of points at stake is determined by the difference in the players' ratings. |
How do the ratings work, in multiple technical paragraphs? | ||||||||||||||||||||||||
Each person enters the system with a rating of 1500. When a match is played, Elo estimates how likely it is that each player will win the match. This is based on the difference in the ratings: if player A has rating RA and player B has rating RB, Elo assigns player A a likelihood of winning equal to $$\left(1 + 10^{ \frac{ R_B-R_A}{ 1135.77}} \right)^{-1} \text{.}$$ The number in the denominator is chosen specifically so that a 200-point rating difference corresponds to a 60% win expectancy. This is different from chess ratings, which are calibrated so that a ten-to-one favorite in a match will be rated exactly 400 points above his competitor. It didn't seem to us like there could ever be a situation where someone is 91% to win a match of Magic—there's too much variance in the game. This is the same normalization used in @Quick_MTGStat's PT Elo table. We elected to use a strange number (in place of the round 400) in order to have a nice interpretation for what 200 points "means". Here's a table of interpretations for other round percentages:
Roughly speaking, each 100 points corresponds to a further five percent. So for example if a hypothetical correctly-rated player with a rating of 1750 were to play a hypothetical correctly-rated player with a rating of 1550 over and over again, the player with the 1750 rating will win 60% of the matches. (This is a long-term percentage, just like if you flip a coin "a lot of times" you'll see 50% heads and 50% tails.) After each match the ratings are updated with the new result. The winning player receives points from the losing player in proportion to the win probability their opponent had. The constant of proportionality is K = 36. Continuing on the 1750 vs. 1550 example, if the 1750-rated player wins, she receives 36 * 0.4 = 14.4 points and her opponent loses 14.4 points. If the 1550-rated player wins, he recevies 36 * 0.6 = 21.6 points and the higher-rated player loses 21.6 points. The numbers 14.4 and 21.6 are in the correct proportion to keep the ratings stable if they were to play a long series of matches and win at the rates expected by the model. Internally the database keeps track of fractions, but we round everything off to the nearest integer when it's displayed to try to avoid distractions. The value 36 was chosen for K because it seems to do the best job at making the ratings have the predictive power they are meant to have: when looking at matches played between people who have played at least ten matches already ("veteran" players), a player rated between 35-65 points above their opponent won 52.53% of the time. With an 85-115 point difference the higher rated player won 54.76% of the time, etc. These numbers came out closest to the intended percentages when we set K=36. We messed around with using a different value of K for Pro Tours, but nothing we've tried yet has gotten us a better fit to the underlying model. |
How do I contextualize the ratings? |
To see how your rating compares to other players in our database, check out the histogram and percentile tables on the leaders page. For example, if your rating is 1575, and you see that the 89th percentile is 1577, you can estimate that you're in approximately the top 12% of players in our system. |
Is a 2-0 win worth more than a 2-1 win? |
No. We're only looking at the result, not the game score. |
Are draws incorporated into the system? |
Unintentional draws are included as half of a win for each player. This has a minor effect on ratings, which you can see as many draws have a delta of +1 or +2. (Many change the ratings by less than 0.5 and so are reported as 0 due to rounding. Rest assured whatever tiny effect they should have is included behind the scenes.) As an example, suppose two players whose ratings differ by 200 points were to battle to a draw. The higher rated player had a 60% win expectancy for the match, while the lower rated player should win the other 40% of the time. In a draw, the higher rated player received 0.5 wins, which is 0.1 short of his expected value of 0.6. The net result is -0.1, and so their rating changes by -0.1 times K, so -3.6 for us. Meanwhile the lower rated player enjoys a boost of 3.6 points from the result. A draw between players that are 54.86 points different in rating will result in a change of exactly one point, and over 73% of draws were worth less than that. Intentional draws do not affect players' ratings. I did my best to find all of the intentional draws (815 and counting as of this writing), but some undoubtedly have slipped through the cracks. If you took an ID in a match and it does not appear as such, let us know and we'll correct it. |
Do people with byes have an advantage? |
No. Only matches played between two people will move the ratings. This is why well-known players' results start at round 3 or round 4 in many events. |
My friends and I made day two of a team GP once. Why isn't that here? |
Two reasons: first, many of the team GP results pages only have last names listed, and if it was a nightmare trying to decide who's who when you have access to the full names.... Secondly and more importantly, team events and individual events are not like terms; the result in a team match doesn't tell you anything at the individual level. So updating what are intended to be individual ratings with results in team matches doesn't make sense. |
How stable are the ratings given that there are problems in the data set? |
The ratings are pretty insensitive to mistakes that don't involve you. If there's a GP you played in recently that wasn't showing up under your name, adding it in could cause a big shift to your rating. But the fact that some people from another continent aren't combined correctly will almost certainly not do anything to your rating. If you lost to "zzVIP-Luis ScottVar" who had a 1500 rating but it should have been "Luis Scott-Vargas" with an 1800 rating, that would change your rating after that match by 5 or 6 points. If you played more after that incorrect entry, the six-point mistake will quickly get washed out. |
Why is one cell in the tables outlined in green? |
That is our subtle way of indicating personal best ratings. Mouse over the delta to see the rating after that match. If a player does not have a green box, that means that unfortunately that player started off with a loss and never climbed back to 1500. We considered indicating personal lows with a red outline, but that seemed a little mean... And while we're here, a couple of other things to note about the event tables: Clicking on the name of the opponent will, not shockingly, take you to that person's page. Less obviously, clicking on the result will take you to the head-to-head between those two players. (Usually not very exciting since most people who have ever played each other did so exactly once.) |
I thought Magic and Elo didn't mix. |
It is true that the predecessor to the current Planeswalker Points system was a kind of Elo rating scheme, and that Wizards scrapped that system. There were several problems with that system, the worst of which was that benefits were tied to your rating. Since the rating mattered and it's hard to keep a very high rating when each win is worth significantly less than each loss, you were incentivized not to play in order to stay at your peak. But this was a problem with how the ratings were used, not with the fact that there were ratings. So we got around this problem by having the ratings be for entertainment purposes only. One could also complain that Magic and Elo don't mix well because there's variance in Magic. The Elo scheme computes an expected win percentage based on the history of the two competitors. The fact that there's high variance means that we should expect that it will take a large sample before the trends begin to emerge, but it doesn't make the model less applicable. Elo is used to make ratings for a variety of games like baseball and Scrabble which have plenty of variance. It also means that we shouldn't put much stock in its predictive power: it's true that an 1800-rated player should beat a 1701-rated player 55% of the time, but it could take a lot of matches before the win rate begins to approach 55%. Therefore using our Elo ratings as any sort of handicapping or prognosticating tool is not recommended. |
Do you have plans for separate limited and constructed ratings? |
Not at this time. This would be relevant for a small percentage of people: only 13% of the entries in the database have played in more than three events (enough for there to even possibly be multiple limited and constructed events). |
What did you do about Fabrizio Anteri? |
Ugh. We felt like we had four options:
#1 is just a spiteful version of #2, so that was out. We can see arguments for the other three, but ultimately opted to go for #3. The ratings are just supposed to reflect what happened, and the wins and losses did happen. From another angle, #2 would rob points from people who beat him while he had a very high rating. Ultimately, we didn't want to be some sort of arbirter of which matches had an outcome tainted by cheating or not, so it felt best to just leave the match data as it happened. To use an imperfect sports anaology, baseball players who have admitted to steroids haven't had home runs subtracted from their career totals. Now having said that, if we did nothing he would appear as one of the highest-rated players, and we'd prefer not to have him sit on top while he serves his suspension. We welcome suggestions about what should be done in these cases. |
What are the weirdest things you saw while compiling the data? |
We weren't expecting to see a bunch of rounds with no results, like Round 9 of GP Providence 2015 for instance. We did our best to reverse-engineer the outcome of each match based on how the standings changed from round to round. (Perhaps naively?) We weren't expecting that if you registered with VIP benefits, you'd get "ZZVIP" put next to your name in the standings, at the expense of the last few characters of your name. We also learned a lot about Spanish naming conventions after discovering that some GPs in Spanish-speaking countries tended to use both last names and some would only use the paternal surname. To check for misspellings I wrote a program that looks for entries that are anagrams of each other. It has served us well, but also turned up some cool coincidences. (My favorite is Shen, Kejia anagraming to Hines, Jake.) |
Who are you and why did you do this? |
We are a couple of mathematics grad students: Adam (1822), who had the idea to do this and did all the scraping and organizing of data, and Rebecca (1456), who helped create the database and website. As for why, there are several reasons:
|