MTG Elo Project

I know something that can correct an error in the data.

Where did the data come from?

What if I don't want my name on this site?

Which tournaments are in the system right now?

Which tournaments aren't in the system right now?

How do the ratings work, in three sentences?

How do the ratings work, in multiple technical paragraphs?

Each person enters the system with a rating of 1500. When a match is played, Elo estimates how likely it is that each player will win the match. This is based on the difference in the ratings: if player A has rating R_A and player B has rating R_B, Elo assigns player A a likelihood of winning equal to $$\left(1 + 10^{ \frac{ R_B-R_A}{ 1135.77}} \right)^{-1} \text{.}$$

The number in the denominator is chosen specifically so that a 200-point rating difference corresponds to a 60% win expectancy. This is different from chess ratings, which are calibrated so that a ten-to-one favorite in a match will be rated exactly 400 points above his competitor. It didn't seem to us like there could ever be a situation where someone is 91% to win a match of Magic—there's too much variance in the game. This is the same normalization used in @Quick_MTGStat's PT Elo table. We elected to use a strange number (in place of the round 400) in order to have a nice interpretation for what 200 points "means". Here's a table of interpretations for other round percentages:

win expectancy	50%	52.5%	55%	57.5%	60%	62.5%	65%	67.5%	70%	72.5%	75%
rating difference	0	49	99	149	200	252	305	361	418	478	542

Roughly speaking, each 100 points corresponds to a further five percent. So for example if a hypothetical correctly-rated player with a rating of 1750 were to play a hypothetical correctly-rated player with a rating of 1550 over and over again, the player with the 1750 rating will win 60% of the matches. (This is a long-term percentage, just like if you flip a coin "a lot of times" you'll see 50% heads and 50% tails.)

After each match the ratings are updated with the new result. The winning player receives points from the losing player in proportion to the win probability their opponent had. The constant of proportionality is K = 36. Continuing on the 1750 vs. 1550 example, if the 1750-rated player wins, she receives 36 * 0.4 = 14.4 points and her opponent loses 14.4 points. If the 1550-rated player wins, he recevies 36 * 0.6 = 21.6 points and the higher-rated player loses 21.6 points. The numbers 14.4 and 21.6 are in the correct proportion to keep the ratings stable if they were to play a long series of matches and win at the rates expected by the model. Internally the database keeps track of fractions, but we round everything off to the nearest integer when it's displayed to try to avoid distractions.

The value 36 was chosen for K because it seems to do the best job at making the ratings have the predictive power they are meant to have: when looking at matches played between people who have played at least ten matches already ("veteran" players), a player rated between 35-65 points above their opponent won 52.53% of the time. With an 85-115 point difference the higher rated player won 54.76% of the time, etc. These numbers came out closest to the intended percentages when we set K=36. We messed around with using a different value of K for Pro Tours, but nothing we've tried yet has gotten us a better fit to the underlying model.

How do I contextualize the ratings?

Is a 2-0 win worth more than a 2-1 win?

Are draws incorporated into the system?

Do people with byes have an advantage?

My friends and I made day two of a team GP once. Why isn't that here?

How stable are the ratings given that there are problems in the data set?

Why is one cell in the tables outlined in green?

I thought Magic and Elo didn't mix.

Do you have plans for separate limited and constructed ratings?

What did you do about Fabrizio Anteri?

What are the weirdest things you saw while compiling the data?

Who are you and why did you do this?