One potential medium-term consequence of the recent announcement that next season's GPs won't have byes is that the Planeswalker Point (PWP) system could be sunsetted. I have no idea if this will happen, but GP byes were the last remaining use for PWPs after they reversed course on a proposed minimum threshold for participation in PTQs. The site is obviously not in active production; it's skinned for Magic Origins, a set that came out over four years ago; most of the achievements you can unlock refer to things from Tarkir block. So I don't think it's an outrageous step to think they could shutter it.

Unfortunately the Elo project came into existence only shortly before an event which was from my perspective catatsrophic: an update to WotC's terms of service necessitated blanking out the names of everyone who hadn't accepted the new terms. If you look at your own PWP history you will see a sizeable percentage of your opponents are not identified. This percentage likely grows as you go back in time in your history, since some of your opponents from 2010 will have stopped playing and haven't had an reason to try to log into their Wizards account (at which point I think they'd be prompted to accept the new terms and I think they'd then start to appear).

Over the past three years I've salvaged just about every round of premier event results that is on the internet, and as long-time fans of the project know I also salvaged some rounds that are no longer on the internet. But the missing-opponents issue seemed to make attempting to crowdsource information from our collective PWP history a non-starter. Yet the thought of the system going down without ever attempting to draw information from it seems like a waste, especially if something useful could possibly come of it. I admit that part of my hesitation stems from my own personal foibles: I've always felt like this is a "small fry" operation and so I don't really have the stature to go around asking things from people. This also has something to do with the fact that we don't have a donate button on the site despite people asking for it for years—I don't really know how to do it, or announce that we've done it, without feeling presumptuous.

Trying to reconstruct a tournament from players' PWP history is going to be especially difficult because it's not enough to just talk to the people I know how to reach, the people who are still active in the community (Chris Pikula, Jon Finkel, Randy Buehler, etc.). We have to find the people who are no longer actively playing since then their names will fill in for their opponents. In the darkest scenario, person #1 would have absolutely no information to give me, but after they've accepted the terms then when person #2 gives me their history I would be able to see if they played person #1; person #3 would have both #1 and #2's name, and so on. The final participant to contact me would have everyone's history, and I would be able to fill everything in since I'd have seen at least half of each match.

I only want to honestly try to do this for two events: Pro Tour Chicago 1998 and Pro Tour New York 1998. There are several reasons why I think these are more fruitful than a random GP from the same era:

  • These are Pro Tours; the players involved were more engaged than people who played a random GP, so there's more chance that present-day community members will know how to reach them.
  • These two events were both draft. That means that if I reach a certain percentage of responses I might be able to deduce who played in which pod and use that information to infer results from people I haven't heard from yet.
  • Since these were both PTs, more people wrote reports than would have otherwise. We get a head start from them.
  • Flatly, PTs matter more than GPs, so it's a bigger gain for the community if we recover PT data.
  • Partial information for both PTs exists. Both of these events had data that was once on the internet, but it was not captured by the Wayback Machine successfully. All of the day 2 links in the archived ptny98 coverage are broken and all of the day 1 links in the archived ptchi98 coverage are broken. But this is better than being completely in the dark. (Maybe there's a chance that someone at Wizards will see me trying to do this and go look for an old CD-ROM in a broom closet in the basement which had a site backup from 1998 and find the results pages for me and we can call this all off.)
New York 1998, missing day 2, will be a smaller operation since I only need to find the 96 people who made the cut. Chicago needs all 324 people who attended. Including reports, I'm at 8/96 for ptny98 and 18/324 for ptchi98 before this blog post was even written.

The following sheet has the data for these two events: round-by-round everyone's opponents that I know, plus a sheet that has a list of every person that I need to contact.

PT New York 1998 and Chicago 1998 data

If you know how to get in touch with any of the people on this list and are willing to help out the project, could you let them know I'm looking for the entries in their history from these two events? They can either send me screenshots like this one --

-- or they go for a copy and paste text dump; that'd also be fine. I wouldn't say no to data from other GPs and PTs from 1996-1999, but like I explained above the two events I'm really looking for right now are ptny98 and ptchi98. Data could come to me either via or DM to Twitter @ajlvi. Thank you for helping me on this unreasonable undertaking!

As I type this, the Mythic Invitational is winding down, with two MPL members and one challenger currently left standing. During the first two days, the record of challengers against MPL members was 23-27, for a win percentage of 46%. How does this compare to how these challengers have done against the MPL in real formats non-duo-standard formats in the past? Perhaps surprisingly (or perhaps not!) it's actually below expectations---the lifetime record of challengers against MPL members in tournaments that we have records of is 198-211, or 48.4%. (Only three of the challengers came in with lifetime winning records against the MPL; can you guess which ones?) But okay, maybe those numbers are a bit skewed...if you remove any challengers who happen to be hall of famers, the percentage goes down to 35.5%.

These stats and more can all be found on our snazzy new MPL Head-to-Head Grid. (Click the link at the bottom to add in all the Invitational challengers.) Even more exciting (and the part that was fun to code) is that you can add yourself (or your friends, or your enemies) to the grid to see how you've stacked up against the officially Most Elite players. (Personally I am undefeated!) I'm excited about this feature not only because it's fun to see exactly how many times my friends have gotten their butts kicked by awesome players, but also because this gives me lots of ideas for expanding this sort of grid to be even more customizable. Want to make a head-to-head grid for your own playgroup? For the entire Hall of Fame? For everyone you can think of named Luis? Well stay tuned, as sometime in the moderately near future we will introduce a similar grid that you can populate yourself. And as always, if you have any other ideas for new features, don't hesitate to let us know!

After many starts and stops writing this entry, it is finally time to tell the story of Grand Prix Kansas City 1999. I added what I could salvage from this tournament to the site about a year ago—I have rounds 4 through 12 of the twelve rounds of Swiss, and a scattered few matches from the first three rounds. You may recall the puzzle I had to solve in order to reconstruct the Saturday matches of GP Philadelphia 2000. That was like trying to replace the batteries in an antique watch. Kansas City was like trying to put the watch back together after it was thrown into a muddy puddle and then run over by a truck. Fair warning, we're going to have to talk about not just usual tiebreakers, but second- and third-order tiebreakers soon.

For context, coverage has come full circle in the two decades since gpkc99 (March 27-28, 1999). At the time Wizards's official site would have coverage ("cybercasts") for some tournaments but not all; pairings and standings for some GPs in this era were preserved because they appeared on third party sites. That might have been the tournament organizer, which is the case for gpkc99: the relevant information appeared on New Wave's site. Other GPs, like Seattle 2000, had text coverage hosted on the Dojo. Some of those pages migrated over to the official tournament archive, but not all of them; the official coverage link for gpkc99 points to a page that only has a text recap of the top 8. I've dug around quite a bit looking for references to other third-party pages that might have coverage on them, but haven't yet found any others that made their way onto the Wayback Machine. There's a small but nonzero chance that, for example, some of the missing European or Asian GPs once had coverage posted somewhere and I haven't had the good fortune to stumble upon them.

New Wave's coverage archive has the suggestion of pages for other tournaments they hosted. The jewel among them is GP San Francisco 1999. Unfortunately the New Wave archive was only discovered by the Wayback Machine once, and almost all of the SF results pages timed out, so there isn't really any hope to reconstruct that tournament like we're about to do here. I wonder sometimes whether the .html files are still out there somewhere... I got the sense that the New Wave coverage was largely spearheaded by Alex Shvartsman, much as the Dojo coverage seems to have been orchestrated by Mike Flores in places. Is there a chance that gpsf99 is on a zip disk in his garage somewhere? I have similar questions about Pro Tour New York 1998 and Pro Tour Chicago 1998—those tournaments were at one time 100% on the internet but they didn't migrate to the current version of Wizards's coverage archive, and the Wayback Machine didn't capture every page from those two events. Are they sitting on a backup tape drive in a filing cabinet in the basement of WotC headquarters?

As for Kansas City, the pages that were on the internet were captured but not everything was posted in the first place. Coverage starts with round three standings on day one, and with round 8 standings on day two. So I have pairings for rounds 4, 5, 6, 9, 10, 11, and 12; and I have standings (including all tiebreakers) for rounds 3, 4, 5, 6, 8, 9, 11, and 12. Yes, the GP only had twelve rounds of Swiss, six on each day. The day two cut was to the top 64, which worked out to everyone at 5-1 or better and one lucky person at 4-0-2. There was a second, much less lucky, person in 65th at 4-0-2. You could have shown up for this event with three byes, lost two matches, found yourself dead for day two, and dropped having played less Magic than you would have played at FNM. Also note that awkwardly round 10 standings are missing: the page exists but evidently the data was corrupted back in 1999.

You'll notice that I only said "pairings," not "results." This is not a deal-breaker: we can make inferences as to the results of matches based on how the numbers of match points change from round to round. (Actually, the pairings themselves had lines like

Finkel, Jon (414)	 27	  1	Rubin, Ben * (456)	 24
saying that Finkel, with 27MP, played against Rubin, with 24MP, on table 1. So the match point information is actually preserved with some redundancy.) But I also didn't say anything about the pairings in rounds 1, 2, 3, 7, or 8. They are missing. This is a big issue. It is also in some sense the opposite of gpphi00, where we knew the pairings and didn't know the results. That's much easier than knowing the results without knowing the pairings. Without even having standings for the day one rounds, there's not really any hope of recovering those, so I dismissed the possibility of recovering rounds 1-3 immediately. However I thought there was a real chance of reconstructing rounds 7 and 8, with the information we had, so let's make that our goal.

The top 64 made day two, so there should be 32 matches in both rounds 7 and 8. The first drops didn't occur until after round 8. This can be verified from the way the score reporter printed the standings. As an example, here's the top line from round 8's standings.

  1	Finkel, Jon	 24	75.8333	94.1176	71.7677	5/5/0/3
This says Finkel was in first place with 24MP; his tiebreakers were, in order, 75.83 94.11 71.76 (we'll learn about these soon); and then the last entry "5/5/0/3" says he played five matches, winning five, zero draws, and three byes. (From this you can deduce zero losses.) After round nine there are some people with lines like "7/4/0/1" implying they had dropped before round nine. But the round eight standings show everyone in the top 64 as having played eight matches. So at the start of the project we're 0/64 on matches deduced.

That's a lie. Both round 7 (Paschover vs. Finkel) and round 8 (Price vs. Maher) had a feature match with text coverage, so there's two matches where we know who played whom. Also there are eight surviving tournament reports from archived snapshots of the Dojo which were written by people who made day two. Unfortunately a couple of the authors didn't know the names of some of their opponents, or elected not to include them. Still, after this free information, we are at 14/64. (Also those tournament reports got us a couple of matches from rounds 2 and 3... I'll take whatever I can get!)

For several people we can see that their after-round-8 match point total is either six more than their after-round-6 match point total, or is equal to their after-round-6 match point total. In that case we know that they won both their matches or lost both their matches, but we don't know against whom. We also have some "loose ends," half-finished players for which we know one of their opponents but not the other. It's time to get our hands dirty.

Magic tournaments track three tiebreakers. The main one, which is what is usually meant when someone just says "tiebreakers," is the average of your opponents' match win percentage. (This is called OMW, for opponent match win%.) For each opponent, calculate the ratio [player's MP] / [3 × rounds played], and then average those ratios. The caveat is that a number less than 1/3 gets replaced with 0.3333. Here's how we can leverage tiebreakers to discover something about missing opponents. This is Jon Finkel's opponents and their records after round 8. Note Jon had three byes.

  R4  Jamie Parke       4-2
  R5  Jacob Welch       5-3
  R6  Gary Krakower     6-2
  R7  Marc Paschover    7-1     [known from feature match]
  R8  [unknown]
    tiebreaker (omw)  75.8333 [known from R8 standings]
Okay, to be fair, we can deduce that Jon's unknown opponent is also 7-1 without doing math, since he should have been paired against another 7-0 player in round eight and he won. But still, let's see how this is done with tiebreakers. The information in the table above leads to the equation $$\frac{1}{5} \left( \frac{12}{18} + \frac{15}{24} + \frac{18}{24} + \frac{21}{24} + x \right) = 0.75833 \text{,}$$ where x is the match point ratio of the unknown player. Solving the equation gives x = 0.87499. We know in this case that x should be a fraction of the form y/24; solving y/24 = 0.87499 gives y = 20.9998, so up to a rounding error we get that the unknown opponent had 21 match points, so was 7-1 after round 8.

Is this even good? There were sixteen people who were 7-1 after round 8, so all we know is that Finkel's opponent was one of those. (There are a couple we can rule out: it isn't Paschover, since they played round 7, for instance. A couple of them also have opponents known from tournament reports.) There are two ways we can proceed. We know Finkel's opponents going forward and we know the tiebreakers after future rounds, so we can learn extra information about the unknown round 8 opponent by looking into the future. Here's the situation after round 9. Note Krakower and Paschover make different contributions now than they did before since they played in round 9.

  R4  Jamie Parke       4-2
  R5  Jacob Welch       5-4
  R6  Gary Krakower     7-2
  R7  Marc Paschover    7-2     [known from feature match]
  R8  [unknown]
  R9  Lan D. Ho         8-1
    tiebreaker (omw)  74.0741 [known from R9 standings]
A calculation like before says that the unknown player's match win percentage is 0.7777, so they're 21/27, or 7-2. This shrinks the pool of possible players from 16 down to 7, as now we need someone who went 7-1 into 7-2. Since we don't have round 10 standings we don't get information about the unknown opponent's record after R10, but we can learn R11 and R12 from the extant data. This "signature" of a player's R8, R9, R11, R12 records often identifies them uniquely, or at worst will make them a member of a set of at most two or three people. It's possible that there will be two people with a given signature but we know the R7 and R8 opponents for one of them, and if that happens then the fact that the signature wasn't unique won't actually hinder us.

Let's look at the second way to accomplish this, with the second and third tiebrekaers. The second tiebreaker is your own game score: it's the number of "game points" you have earned divided by three times the number of games you played. Game points are like match points; you earn three points for a win, one point for a draw, and no points for a loss. As such a 2-0 win counts as 6/6 game points, a 2-1 win counts as 6/9, etc. Draws are annoying for game scores, since it depends on reporting the correct kind of draw. If you draw because game three didn't reach a conclusion, that's a 1-1-1 match result, so 4/9 game points. If you ID, that's an 0-0-3 match result, so 3/9 game points. If you draw because game two finishes in extra turns and you don't get to start game three, that's a 1-1 match result, so 3/6 game points. This never matters in practice, but in doing tiebreaker math I've noticed that occasionally draws are put in as 1-1 (3/6) instead of 0-0-3 (3/9) like they're supposed to be.

The final tiebreaker is the average of your opponent game point percentages. (This is OGW, for opponent game win%.) As with match points, there is an artifical floor of 0.3333 imposed on your opponents who have own game scores below that percentage. The second tiebreaker will report a number less than 1/3, but the number that gets used in the third tiebreaker calculation will be inflated to 1/3. Let's look at Finkel again post round 8, this time examining the game scores of his opponents. Usefully the game scores can be read off of the round 8 standings, since those are the second tiebreakers. So we don't have to try to reconstruct game scores for all the previous matches in order to use the third tiebreaker.

  R4  Jamie Parke     72.7273
  R5  Jacob Welch     55.5556
  R6  Gary Krakower   75.0000
  R7  Marc Paschover  77.7778 [known from feature match]
  R8  [unknown]
    tiebreaker (ogw)  71.7677 [known from R8 standings]

This means that, like before, the opponent's game score percentage solves the equation $$\frac{1}{5} \bigl(72.7273 + 55.5556 + 75 + 77.7778 + x\bigr) = 71.7677\text{.}$$ The solution is x = 77.7778. Now we are looking for someone whose own game percentage (second tiebreaker) is 77.7778 and who has 21 match points after round 8. There are only three such people: Tony Tsai, Craig Dushane, and Marc Paschover (who is ineligible to have played Jon round 8). We can go deeper and reach a conclusion now: Tsai entered day two at 6-0 and Dushane entered at 5-1. We know that Jon played another 7-0 and beat them, since Jon winds up on 24MP and tiebreakers showed his opponent ended up at 7-1. Therefore only Tony Tsai could have been Finkel's opponent. It took seven paragraphs and ~1000 words, but we now are 15/64.

Did you notice that there was something special about Jon Finkel that made the calculations possible? Puzzle that for a paragraph.

To recap, there are two different pairs of information that can help shed light on unknown opponents: we can use the combination of the first tiebreaker (OMW) together with players' match points, or we can use the combination of the third tiebreaker (OGW) with the second tiebreaker (the games equivalent of match points). Using these, we can build a signature of the unknown opponent's record in future rounds. Eventually hopefully this process will narrow down the set of possibilities to one player, or at least one player among those that are unaccounted for. We started with several "loose ends" since we knew only one of the two opponents for several players who happened to have played one of their rounds against someone who wrote a tournament report. We pray that filling in loose ends will create other loose ends and we will eventually untangle all 64 missing pairings.

The thing that was special about Jon Finkel is that he had three byes, so we had otherwise total knowledge about all his other opponents. Let's jump from Finkel to Tony Tsai now. He only had two byes. Here's what we know about his tournament so far. (Remember, annoyingly, we don't have R7 standings.)

                       R8 record     R8 game pct
  R3  [unknown A]        
  R4  Danny Speegle       3-2          60.0000
  R5  Mike Caselman       4-2          56.2500
  R6  Devon Herron        6-2          68.4211
  R7  [unknown B]
  R8  Jon Finkel          8-0          94.1176
    R8 tiebreakers      71.1111        66.2500     [omw / ogw]
How can we make progress when there are two variables in the equations? Don't forget that we have information after round 6, too! The best possible result for us is if Tony's round 3 opponent did not make day two. If that's the case, then the record of the unknown round 3 opponent will not change between round 6 and round 8, and the first line of blanks will get filled in, ready for use in the round 8 calcuations. With this in mind let's strip off rounds 7 and 8 and look at the end of day one standings.
                       R8 record     R8 game pct
  R3  [unknown A]        
  R4  Danny Speegle       3-2          60.0000
  R5  Mike Caselman       4-2          56.2500
  R6  Devon Herron        5-1          71.4286
    R6 tiebreakers      65.0000        60.8085     [omw / ogw]
From this table we infer that Tony's round 3 opponent had a .5000 match win percentage (either 2-2 or 3-3, we can't tell but it doesn't matter) and a game win percentage of .5556. Usefully, they did not make day two. So their contribution isn't going to change between rounds 6 and 8. We can go back to the first table and fill in the static information about unknown A, leaving us only with unknown B to consider.
                       R8 record     R8 game pct
  R3  [unknown A]         2-2, say     55.5556
  R4  Danny Speegle       3-2          60.0000
  R5  Mike Caselman       4-2          56.2500
  R6  Devon Herron        6-2          68.4211
  R7  [unknown B]
  R8  Jon Finkel          8-0          94.1176
    R8 tiebreakers      71.1111        66.2500     [omw / ogw]
This table implies that unknown B was 6-2 after round 8 and had a 63.1555 game score. Only one person fits that bill: Justin Holt. Since both entered day two at 6-0 and Holt is now 6-2, the result of the round 7 match was a win for Tsai. 16/64! (As a footnote, nobody after round six had nine match points and a 55.55 game score. But there were fourteen people who were 2-2 with that game score. Since they dropped after round 4 there's not much hope of figuring out who they were.)

What would we have done if Unknown A had made day two? I think the only logical options are panic and despair. The problem in that case is that the contribution that Unknown A would have made to the round 6 tiebreakers will not match the contribution they make to the round 8 tiebreakers, so learning where they were after round 6 isn't particularly helpful. Many of the players who made day two had one or zero byes, so in place of our single mystery player we would calculate from the round 6 standings a sum of two or three mystery players' statistics. If the stars align and none of them made day two, then their agglomerated tiebreakers will contribute the same amount towards round 8, and we can then isolate the single missing person just like what we did for Tony.

You should probably be asking right now, if we're in a situation where there are multiple unknown day one opponents getting clumped together, how would we know whether any of them made day two in the first place? It shows up when you try to calculate the unknown day two opponent's information from tiebreakers. We're expecting to see match win percentages of .8750 for a 7-1 record, .7500 for a 6-2 record, or .6250 for a 5-3 record. (A couple of players have draws, but excluding those for now these are the only options. Two players were 8-0 and we have them taken care of.) Suppose we infer a match win percentage of .8525; that would be 20.46 match points out of 24. That's bad news. A result like that means that something is wrong upstream—someone from day one is making a different contribution to round 8 than they did to round 6. Unfortunately that player's tiebreakers are then useless, since we can't isolate the signature of their unknown day two opponent. I didn't calculate tiebreakers for everyone, since both opponents were already known for whatever reason for several people at this point. Of the players I did calculate, eleven had useless tiebreakers. This adds a level of suspense to our excavation effort, since at the bottom of our well is now a swill of uncertainty.

I mentioned draws in the previous paragraph. There were six people whose round 8 match points differed from their round 6 match points by 1 or 4. Those people had to have played each other in at least one of their matches. For a couple of them one of their non-draw result was known, and so then that forced their other match to be a draw. The location of the six draws was comparatively easy to isolate.

You may recall that our over-arching plan was to pull on loose ends (players for which one of their opponents' identites is known) until our knot untangled. I have sad news: we won't get to the finish line this way. At some point in the high 30s I got stuck; all the loose ends involved people with useless tiebreakers, so I needed one new idea to get to the end. Let's look at Eric Lauer, who had three byes but didn't have either of his opponents' identities uncovered up to this point. He goes from 5-1 after round six to 6-2 after round eight.

                       R8 record     R8 game pct
  R4  Brent Parr          7-1          75.0000
  R5  Devon Herron        6-2          68.4211
  R6  Joel Noble          4-2          61.5385
  R7  [unknown A]
  R8  [unknown B]
    R8 tiebreakers      75.8333        66.2551     [omw / ogw]
The goal is to try to tease apart the two missing data points from their sum. For match win percentage, the contribution of A+B is 1.5 in aggregate. Multiplying by 24 tells us that A+B had 36 match points altogether. Either they were both 6-2 or one was 7-1 while the other was 5-3. Assuming that there aren't any pairdowns, the first situation can't occur! This is because Lauer either goes WL or LW. If it's WL, then opponent B plays him in round eight where both are 6-1, and Lauer loses, so B winds up 7-1. Otherwise opponent B plays Lauer in a round eight match where both are 5-2, and Lauer wins, so B winds up 5-3. There aren't many 7-1 slots to go around at this point, so this is possibly useful already. Even more powerful is to look at the aggregate contribution of the game win percentages. The contribution of A+B to OGW is 126.30. I then wrote a program in Python to look at every possible way that two own game scores (second tiebreakers) could add up to 126.30, and it turns out that the only pair among the ones that were left at the time is 68.42 + 57.89, and only Jeff Matter has a 57.89 second tiebreaker. Even better, only John Lagges has the combination of a 68.42 second tiebreaker plus a 7-1 record. (Nobody has 68.42 + 5-3.) So now we know that Lauer plays Matter + Lagges in some order. This potentially gets us un-stuck, since now both Matter and Lagges are "loose ends" as one of their opponents is known. We just don't know whether Lauer plays them in round 7 or in round 8. Further down the line this thread of reasoning hooked into someone with useless tiebreakers, for which one of their opponents was already known. That then snapped everything we had done so far into place.

These ideas plus a lot of patience were able to determine all 64 matches. My first attempt at this didn't go well because I think I made some pretty shaky logical conclusions from useless tiebreakers somewhere early on in the process. For my second, successful attempt, I tried to be meticulous in note-taking so that I would have multiple save points in case something went south. Here's my main thread of notes (PDF), containing the 64 matches deduced in order. Here you can see my furious scribbling (JPG) trying to work out information about unknown opponents; this goes on for several pages. In the image you can see me working out the records for an unknown opponent in future rounds (boxed in each table). Sometimes I'm able to figure out the identities. For others, OGW calculations had it limited to a couple of people before I started—you'll see in the table in the middle that Ferguson's R8 opponent was either Stanton or Lewis, and the fact that the unknown opponent made OMW contributions of 18/24 and 18/27 in rounds 8 and 9 meant it must have been Lewis. Most of these calculations wound up in a spreadsheet that I used to track my progress.

I should add that there are two other places where I had to use this technique to recover lost pairings: GP Kuala Lumpur 2000 round 10 and Pro Tour Los Angeles 1998 round 4. These were significantly easier due to (a) having total information about all previous rounds and (b) only needing to reconstruct one round instead of two consecutive rounds. ptla98 R4 is the only one of these that took place on day one, so at the lowest tables we are trying to determine identities of players who had 0-3 records. This is typically impossible because the .3333 floor artifically obfuscates players' identities. Still, I was able to recover 156/164 matches, which I'm treating as a win. I believe that I could reconstruct all of the missing days of ptny98 and ptchi98 if I had the standings after each round, but sadly the standings are on the tape drive backup in the basement right next to the results and pairings. I'm hoping I never have to do this again, though if it means more data on the site and we come across data that needs to be rebuilt I'm absolutely up for the challenge.

I finally got to the bottom of my pile of grading and that means it's time for World Magic Cup stats! Data in this table covers everything that's on the site—individual pro tours and Grand Prix dating back to mid-1998, with data getting spotty in late 1999. (See the FAQ.)

You can sort this table by clicking on a column heading. Note that with only three people on each team, the middle rating is the median. If you'd prefer to limit the sort to only the teams with three rated players, here's the average sort with those teams filtered out. Mouseover a rating to see the name associated with it, or click on a country to make the names appear. The blue rating is the team captain and the red rating is the national champion. For some countries those two people coincide (hence the purple), while for some others either the pro points champion or the national champion declined their invite. Best of luck to all participating teams!

Besides tournament results, there's another area of the Magicsphere which possesses a vast well of data that can be mined and analyzed: decklists. About a year ago I put together a database of decklists from Pro Tour and World Championship single-elimination rounds, and I've made tweets and Sporcle quizzes about some of the interesting tidbits I've found in them. I also like to play mythic bingo where I track how many different mythics have appeared in the 5-0 Magic Online standard league deck dumps.

I've been playing a pretty rogue modern deck lately (WR Death and Taxes) and I've been looking for a tool that makes it easy to see things like "has card X ever appeared with card Y" or "has anyone ever tried Z in the sideboard", etc. While I have been able to answer my questions on other sites, the number of extra clicks and/or simultaneous open tabs has gotten to me. So I made a crude tool for myself, and I think it might be useful enough to other people that I'm sharing it with you here. Behold the modern 5-0 decklist sandbox. (There's now also one for legacy and standard.) The page will display information about how to use it when it is first loaded.

This lets me answer questions that otherwise would have been out of reach, like "how many cards have appeared in every single deck dump?" or "which Eldrazi have made it into a 5-0 list?" or "who has had the most decklists published?" etc. There's a lot of fun questions you can answer with the sandbox, besides the original purpose of hunting for decks that have specific cards in it. (If you give it a list of cards and a list with those cards has been published a link will appear.)

Since I made these sandboxes primarily for personal use there are probably some design decisions that are a little idiosyncratic. But if you have any suggestions or feature requests, let me know and I can see if I can add them in. Enjoy!

The World Championships are nigh! I've been posting stats to Twitter recently but haven't done a good job cataloging them here. My apologies... it's easy to feel like I'm done after I've fired off the tweets, but I should really do a better job cataloging my statistics projects in this space for people who aren't following me there. (Having said this, I do feel like the percentage of stuff I produce that is of a quality to appear here is not 100%, so if you want to see some half-baked numbers you might try following @ajlvi on Twitter.)

You can find the following information on the twenty-four Worlds participants: a lifetime head-to-head grid (Reid Duke's 45-32 record against the other 23 players is pretty impressive), a breakdown of each player's sesaon (basically what you'd hope to see on the back of each player's baseball card in the "2017-18" row), elo-based metrics for each player (Brad Nelson's 2222 average Elo throughout the year is quite eye-popping), and the ever-controversial, just-for-funzies, have-I-ever-told-you-Elo-is-a-crude-tool results of simulating the tournament 2.5 million times using either Elo or average Elo as the only determiner of wins and losses. Of course the probability of winning will line up with the list of everyone's Elo in descending order, but the interesting thing to pay attention to here is the margins — it gives you some idea as to little a deal one rating point is. If I were a bookmaker I wouldn't set betting lines based on these simulations, but I would use it to remind myself that the margins are going to be very thin this weekend. Good luck to all!

Here's a challenge for you — the kind of task you'd have to solve on an Elo project job interview. I'm recording it here so that I know where to find it in the future. At Grand Prix Chiba last weekend two different people with the name Ryo Takahashi registered for the event. Both of them went 6-2 on Saturday and advanced to day two, when someone realized that there were two people with the same name. In Sunday's rounds the two players have the last four digits of their DCI numbers attached to the end of their names so that they can be told apart. But if you look at, say, the round 4 results, you'll see two people with the same name. Using the tiebreakers of the two players (and some of their opponents) you can figure out whose day one results are whose. See if you can accomplish this. The correct answers are here on the site if you want to check your work, and of course I'd be happy to provide explanation if you want to know how to do this. It took me about twenty minutes to disentangle the two players' results, and my guess is that if you have never tried to do anything like this with tiebreakers before that you're going to need to set aside at least an hour to figure it out.

I became aware last week that the version of Wizards's site that I thought was the oldest one that was archived by the Wayback Machine was in fact not the oldest. An older version from the 1998-99 era included coverage ("cybercasts") of PT Chicago 1997, PT LA 1998, PT NY 1998, and PT Chicago 1998. Unfortunately these pages were only trawled a couple of times: incompletely in early 1999 and unsuccessfully in late 1999. When the site was redesigned sometime in 2000, the cybercasts were not ported, and all future sites have copied off of the information available in the 2000 version. I also saw rumors in a post on the Dojo that Worlds 1997 may have had some sort of internet coverage, but it predates even this older version of the site.

Here's the status of those PTs: ptchi97 is relatively complete, but day one information was never posted in the first place because of technical issues at the tournament site. The cybercast only consisted of standings, so it would be a challenge to try to get pairings out of nothing. The three from 1998 are more promising: pairings and standings once were on the internet. ptla98 is intact except for round 4, but the Wayback Machine failed to capture any of ptny98 day two or ptchi98 day one. (There was one intervening event, PT Mainz 1997, which didn't have a cybercast at all.)

I've reconstructed 156/164 matches from ptla98 R4 from tiebreakers, and that tournament is now on the site. I don't think there's hope to reconstruct the others from what's available. I can almost imagine getting ptny98 back together if we had the final standings and tiebreakers; day two rounds have a small number of matches and the tournament reports that exist will fill some of it in, which might give me a toehold. On the other hand ptchi98 is missing day one, which is a much bigger disaster—the rounds are bigger and people will go 0-4 drop which means they won't show up in tiebreakers except as the minimum .3333. This makes it impossible to recover their matches from the standings.

Still, I thought that after I added Kansas City 1999 that I wouldn't have any more old data to add, so getting another Pro Tour is pretty cool! Now I'm aware of five tournaments that were once on the internet that aren't on the site. Here's hoping that someday I'll get extra information which lets me rebuild them! Next post (which may not be for another month or so) I'll discuss the reconstruction effort of gpkc99.

I promised a while ago to talk about the reconstruction effort I underwent to recover Grand Prix Philadelphia 2000. Fair warning: things may get a little technical ahead.

In a round of a typical tournament, three pages of information are generated by the event reporter: a list of pairings at the beginning of the round, a list of the results of each match after all the match results are put into the system, and the standings as of the conclusion of the round in question. For our purposes, it's the middle one of those three that's the most useful, because we need two pieces of information for the dataset: (a) who played whom and (b) what the match result was. The results page just tells us that straightaway.

Strictly speaking the results page is a convenience, since the information in it can be reverse-engineered from the rest of the coverage. If you know everyone's match point total as of round N-1, and you know the pairings for round N, and you can see the standings after round N, then we can figure out the results from round N. A player won her match if her match point total after round N is three more than that after round N-1, lost it if that difference is zero, and drew if the difference is one point. Sometimes the results pages are corrupted in some way (the most typical error is the round N results page being the same document as the round N-1 results page) and I use this method to recover the data for the site.

You can imagine I wasn't impressed with the coverage page for GP Philadelphia 2000: none of the rounds have a results page at all, and the first time we even see standings is round 6. This means for round 7 onward we can recover the results by using the method outlined above. (Round 6 doesn't work because I don't know the starting number of match points — those would be in the round 5 standings page.) Then I crossed my fingers, because sometimes the pairings pages include the MP totals. These don't. All I knew about the first six rounds are the pairings. Would that be enough to recover the results?

On the face of it that may sound crazy, but there's reason to believe that there may be enough data here to figure everything out. The results for some people will be immediate from their match point total: if they have 18 match points won every match they played and if they have 0 match points they lost them all. This will distribute some losses to people who played the 18MP players and some wins to the people who played the 0MP players. Maybe after that sweep is done we'll have assigned a loss to some people with 15MP (= 5-1 record), so we'll know they won all the rest of their matches, or maybe we'll have uncovered a win to someone with 3MP, so they'll have had to lose all the rest of their matches. (Note that 3MP could have been a record of 0-x-3 or 1-x, but since we've found a win for that person, their quantity of points left to assign is zero.) Then we get to go back and take a second pass, looking for byelines that can be completely filled in. In a perfect world, this initial cascade might fill in all six rounds.

There were 582 people in the tournament, and the successive passes filled in 86, 52, 34, 28, 16, 13, 7, 6, and 2 people, for 244 total. That's something, but not everything. Most of the other players had some matches filled in, just not all of them. As an example, after my first sweep my Python structure had an entry of the form

            Lowery, Brett  12  .W..L.  [12, 0, 3, 12, 12, 9]
meaning Brett had 12 MP after round 6, with a win round 2 and a loss round 5 already accounted for. The list at the end stored the match points of all six of Brett's opponents. The possible results a player could have were W, L, D, B(bye) and X(drop). The pairings pages told me who had a bye in each round, so I at least had that going for me. A player dropped when he stopped appearing in the pairings. Thankfully nobody left and came back somehow.

The goal now was to find ways to get myself "unstuck". If I could puzzle out an individual player's results somehow, then we could resume the cascade; even filling in one match might lead to settling a substantial number of players. The big cache of information that I've left untouched so far is the fact that the pairings are done by the Swiss system, meaning the identity of your oppoents encoded some information about your record at the time of each match. I'll try to illustrate with examples some of the techniques I used to tap into that data. I believe the list below is exhaustive in the sense that, by applying the observations below, together with cascading, was enough to recover all the results.

  1. Look again at Brett's line above. The win already credited to him round 2 turned out to be against someone who ended the tournament with zero match points. Brett's round 2 opponent was definitely 0-1 after round one, so if they played each other Brett (almost certainly) was 0-1 himself. Therefore I credited Brett with a loss round 1.

    I should address here that there's of course the possibility of a pairdown. I made the simplifying assumption that there were no three-point pairdowns, since there were always people with draws intervening. For instance, in round two if a pairdown was necessary then there should have been a 3MP-1MP match and a 1MP-0MP match instead of a 3MP-0MP. If this assumption is violated and people can be paired across brackets, I'm afraid what we're trying to do becomes more augury than science.
  2. Here's another player's line:
                  Magby, Mike  12  B...L.  [xx, 7, 12, 12, 15, 9]
    Mike's round two opponent had a draw somewhere in the tournament. But it wasn't against Mike, because the only way to reach 12MP with a draw is by going 3-0-3. Using this exclusion principle, I checked each person with a draw to see if exactly one of their undetermined matches could have been a draw. Notice that "eligible to have drawn" is something that depends on how many match points are left to be assigned; a hypothetical person with 10MP after round 6 and an uncovered history of .DL.W. won't have any more draws, so they definitely didn't draw with their round 4 or round 6 opponents.

    This logic eventually uncovered that two people with 9MP had to have had a draw; they both were 2-1-3 after round six. Thanks for spicing up the project, guys.
  3. Suppose after round six you have P match points and your round six opponent has P-3 match points. Then you won round six. (Again, assuming no three-point pairdowns.) Similarly if you and your round six opponent wind up with exactly the same number of match points as of round 6, then you drew round 6. If the discrepancy between your round 6 total and your opponent's total is 1 or 2, then there was some sort of a pairdown; I didn't try to assign a result to a match like that at this stage. This logic applies to Mike Magby (above), who won against his round 6 opponent.
  4. The logic of the previous item can be extended. Suppose you dropped after round 5 with P match points, and your round 5 opponent showed up in the round 6 standings with P+6 match points. Then you played that opponent in a P-P match, the opponent won and went to P+3, and then they won again round 6 and went to P+6. In short, they finished WW, and you ended LX. You can go yet another step here and consider people who dropped after round 4; if their opponents ended round six with nine more points than the dropped player, that opponent finished WWW.

Maybe there was one other item that I've forgotten about, but I believe these were the only methods that I used to fill in every result from the first six rounds. I was a little astonished at the end that everything was not only filled in, but also internally consistent; I think that illustrates how much information is already contained in the standings. My goal was to use the lightest touch necessary to recover all the results; I'm sure there's other ways to draw the same conclusions, but I wanted a set of axioms that would let the rounds fill themselves in as much as possible. This way if something went wrong there would be a more limited place to look for inconsistent hypotheses; this is especially valuable since future deductions depend on previous work. Unfortunately for the other big reconstruction project (GP Kansas City 1999) things need to be done more by hand. More on that job another time.

I should address the question about whether the results I reconstructed are unique, or if there's some other way to fill in the grid that would assign everyone the appropriate number of match points. This mainly depends on whether there was a three-point pairdown really early in the process, since future deductions are based on previous results. I'd be somewhat surprised if what I came up with wasn't an exact match to historical fact, or at least was really close to it, so I ultimately decided to include the reconstruction on the site. It would be nice to try to reconstruct the data in a different order to check for discrepancies, but I admit I'm not optimistic that I'm going to have the time or motivation in the near future. If anyone else wants to torture themselves and go through this, though, I'd be happy to compare our results!

I've been good at updating the site but not so good at recording my updates here. Since the last blog post, here's what's happened:

  • Almost all available old data is on the site. There's a few things missing, mainly GP Kansas City 1999, which I need a block of several hours to work on. But practically everything that's ever been on the internet is now on the site. See a new FAQ item to see the timeline of early events. Roughly fifty tournaments from 1996-1999 are missing; I believe most of them had no internet presence even at the time. That means 92.5% of Magic history is here in terms of events, and probably closer to 98% in terms of total matches. (Unfortunately the 2% that's missing includes some relevant information for correctly rating some early stars like Steven O'Mahoney-Schwartz, Jon Finkel, Randy Buehler, etc.)
  • We added a player search to the win percentage by format page to complement the leaderboards there. (This was one of the most requested features, so I'm glad we got it done!)
  • Tables for top 8 likelihood by record now exist for Grand Prix. These are updated after every tournament.
  • I've also posted to Twitter a couple of stats projects, like this sheet of cumulative records in knockout rounds. It's been added to the stats hub. Because it includes team data it's pretty unlikely that it will get incorporated into the site, but I'll try to update it once a month or so.
More soon, of course! We've cleared off a few of the highest things on our queue but the list of things we'd like to do is still pretty long.

I got interested in checking in on the leaders among matches played in calendar year 2017, since the last tournaments of the year are over. Here's a table of stats.

There are a variety of ways to try to measure performance over the span of a year. The most straightforward is win percentage, and in that metric Huey Jensen (87-31, 73.73%) laps the field. Among people who played 100 matches, he's almost 5 percentage points above second place, Fabien Li (93-42, 68.89%). Even if we admit anyone who's played 75 matches, the second best becomes Jeremy Sager (56-23, 70.89%).

Average Elo may do a better job of capturing who has had sustained excellence throughout the year. To keep your Elo rating over 2000 you can't have any stretch of mediocrity. Not only do you need to be winning a large percentage of your matches, but you need to consistently post that percentage week to week or you'll see fluctutaions in your rating. As it's easier to lose points than gain them when you're at 2000+, those fluctuations will ultimately bring down your average. There are 71 people who had an average rating of 2000 this year (while playing 100 matches), and the table in the link above shows all of their seasons. On this measure, two people who played 100 matches and maintained a 2200(!) average rating throughout the year, pacing the field: Brad Nelson (2221) and Huey (2218). Both touched 2400 at some point during the year.

Another option is to look at minimum Elo. With average, if you somehow were to oscillate evenly between 2000 and 2200, you'd wind up with an average of 2100. But the highest rating you actually maintained was 2000. This is a cruel stat, since it will punish people like Huey who had a crazy high in the middle of the year but happened to begin last January with a less ridiculous rating (he started his first event, PTAER, at 1980). The winner here is Owen Turtenwald, who never went below 2060 at any time all year. His average rank after weekends with tournaments is 15th, and he only was outside the top 30 for one weekend. There's a reason we call 11-4 at a GP the ORAT special: the man is wildly consistent. In fact, Owen has the longest active streak of matches with a 2000+ rating: 469, dating back to the 2015 World Championship. Brad Nelson is the only other one who can say that his streak goes back to 2015. His streak of 428 matches began at GP Atlanta 2015.

As always, if you have suggestions for statistics, I'm happy to crunch numbers!

Last week I added two more years to the site, getting us back to Worlds 2003. I've already scraped back to Worlds 2001 — all 60 tournaments exist, though there's a round missing in one of them. At this point there are 153 tournaments left not on the site. Probably about 100 of them can be acquired, and in the next update I'll get a majority of them done. In absolute terms, there are 490 tournaments on the site out of 643 total, for 76.21%. In terms of matches, there are about 1.997 million matches on the site and I think there are about 225,000 matches missing, meaning we're close to 90% on that scale.

With this update we also reconfigured the architecture of the database. We're a little closer now to how it probably "should" have been done from the beginning, but we were young and foolish at the start and have learned a lot from working on the site! As an end user you should experience benefits from this restructuring in a few different areas:

  • Some pages, especially the leaderboard, should load faster.
  • The downtime needed to add new tournaments will be shorter.
  • We have integrated the format of each match into the site, so in the coming weeks we'll be able to upgrade the leaderboard of by-format win percentage to allow for individual lookup.
  • We integrated a toggle into the site that keeps track of who is an "active player". Right now we're not acting on this, but we're thinking about how to adjust the rankings to only account for people that have played in the last, say, five years.

It's time for another World Cup! It astounds me how much the site has grown — last year when I was making this table I didn't even have three years of data to work with. Now I have fourteen. I did my best to find the participants but some of the national teams appear to have people that have never played in a GP or PT. They've been colored gray and given the starting rating of 1500. Some ratings may differ slightly from players' personal pages because they incorporate corrections and/or data from 2003-05 which hasn't been integrated into the site yet. (More on that next week!)

You can sort this table by clicking on a column heading. Note that with only three people on each team, the middle rating is the median. If you'd prefer to limit the sort to only the teams with three rated players, here's the average sort with those teams filtered to the top. Mouseover a rating to see the name associated with it, or click on a country to make the names appear. The blue rating is the team captain and the red rating is the national champion. Best of luck to all participating teams!

The World Championship is this weekend! In preparation I've put together a couple of pages of stats. I tweeted them out earlier this week, but so that they're all in one place, here's some links:

Another couple of years have been added to the site. Thanks to work I did for Bob Huang's series of articles on CFB I had already put together data for GP Philadelphia 2005, so I did a little bit of 2005 to reach that GP specifically (November 12, 2005). I think we'll run out of useable information in about two more batches. My goal is to do the next one by the end of October, but that might be pushing it.

I tweeted this chart out when I updated with GP Denver but I realized it belonged here as well. Brad Nelson has had a crazy last four GPs: an undefeated win in Omaha, then a 6-0 drafting performance for 13-2 in Kyoto (he finished 11th on breakers), then he reached the top 8 of Minneapolis (lost in quarters), and now he's won again in Denver. Making top 8 in three GPs out of four attended is pretty rare — only eleven people have done it (some multiple times) and it hasn't been done in two years.

Made top 8 in three Grands Prix out of four attended. (bold: top 8, blue: win)

namegp #1gp #2gp #3gp #4
Jonathan Sonnegpphi05gpcha05gpric06gptor06
Quentin Martingphass06gpcar06gpkl06gpath06
Kenji Tsumuragpkl06gptoul06gpstl06gphiro06
Shuhei Nakamuragptoul06gpstl06gphiro06gppho06
Jelger Wiegersmagpbar06gptor06gptoul06gpmal06
Andre Coimbragpmal06gphiro06gppho06gpath06
Klaus Joensgptorin06gpsto07gpstra07gpfir07
Paul Cheongpdal07gpcol07gpmon07gpsf07
Paul Cheongpsf07gpkra07gpday07gpvan08
Yuuya Watanabe (4/4)gpban09gpnii09gppra09gpmel09
Yuuya Watanabegppra09gpmel09gptb09gpkit09
Brad Nelsongpmin09gpoak10gpwas10gpcol10
Brad Nelsongpwas10gpcol10gppor10gptor10
Shota Yasookagpkob11gpsin11gpsha11gpbri11
Owen Turtenwaldgpatl11gpar11gpden11gpdal11
Owen Turtenwald (4/4)gpden11gpdal11gppro11gpsin11
Yuuya Watanabegpkan11gpsha11gppit11gpmon11
Paul Rietzlgpsea12gpmc12gpslc12gpana12
Yuuya Watanabegpkob12gpkl12gpmani12gpyok12
Sam Blackgplou13gpwdc13gpabq13gptor13
Jeremy Dezanigpvie13gppra14gppar14gpvie14
Brad Nelsongpmia13gplou13gpric14gpcinc14
Marcio Carvalhogpman14gpmil14gpmad14gpstr14
William Jensengpphi14gpatl14gpchi14gpdc14
Pascal Maynardgpott14gpoma15gpmex15gpvan15
Martin Juzagpman15gpshi15gpmex15gpsev15
Paul Rietzlgpsd15gpokc15gpwis15gpind15
Brad Nelsongpoma17gpkyo17gpmin17gpden17
Andrew Cuneogplv17-limgptor17gpind17gpdc17
Corey Baumeistergpmin17gpden17gpdc17gpphx17
Corey Baumeistergpden17gpdc17gpphx17gpatl17

Note that Owen and Yuuya are the only two to have a streak of four individual GP top 8s in a row, and both of them were part of stretches of five out of six!

(This table was updated on November 17, 2017.)

Piggybacking on the script I wrote for the previous chart, there's now one for expected number of pro points based on your record. Interestingly, the "0-0" box reads 4.4; of course it slowly dwindles down to 3.0 as the number of losses ticks up toward eight. This says that a PT appearance is on average worth 4.4 pro points, and thus gold status is worth about 17.6 just from the four PT berths.

I decided that instead of updating the chart in the post below with the results from PT Hour of Devastation, I should make a separate page for it which I'll update after every Pro Tour.

Yesterday I worked on recreating my favorite FiveThirtyEight infographic using the data we've collected from the site. If you're following me on Twitter, you probably saw some escapades as I tried repeatedly to get this right, and even the final image that I posted wound up slightly off. >.< The biggest culprit was the play-in rounds of Pro Tour Kaladesh and Pro Tour Aether Revolt, which deeply confused my script that attempted to figure out who was in the top 8 of a given event. Second-biggest was PT Kyoto 2009, which apparently only had 14 rounds of Swiss, and my program then gave everyone a bye in rounds 15 and 16. ^_^; Oops.

Here is, as far as I can tell, the correct table. This graph tabulates the percentage of players with a given record that have gone on to make the top 8. Data comes from all 16 round split-format PTs (ones with both draft and constructed). That's the last 28 tournaments. Some multi-draw columns were omitted due to small sample sizes. You can now mouseover the cells (or tap them on mobile) to see the data; the tooltip shows {the number of people who made top 8 after having this record} / {the number of people who have had that record in total}.
at conclusion of round...
x-0 x-1 x-2 x-2-1 x-3 x-3-1 x-4
16 LSV1/1 1001/1 1001/1 10017/17 10011/11 99116/117 2335/150
15 1002/2 1002/2 10012/12 10045/45 94112/118 4716/34 7.830/384
14 1002/2 1007/7 9858/59 7610/13 55115/207 3314/42 2.515/591
13 1002/2 10018/18 7977/97 387/18 29101/339 1912/62 0.605/835
12 1004/4 9632/33 6095/157 309/30 1371/537 7.76/78 0.263/1160
11 1007/7 8248/58 3799/264 8.33/36 6.655/828 4.75/107 <0.11/1508
10 10014/14 6361/96 2299/445 6.94/58 3.036/1195 2.74/147 00/1877
9 8219/23 4480/179 1291/714 5.35/95 1.220/1698 1.32/153 00/2064
8 6429/45 2993/311 6.675/1130 2.43/123 0.6615/2272 0.561/178 00/2606
7 5142/81 18105/566 3.560/1698 1.32/152 0.319/2863 0.571/175 00/2724
6 3356/166 11109/979 1.947/2455 0.591/170 0.155/3280 0.601/166 00/2391
5 2274/333 6.1101/1649 1.137/3307 1.12/178 0.155/3296 00/114 00/1595
4 1392/674 3.594/2663 0.7731/4017 0.651/153 <0.11/2639 00/44 00/634
3 8.8119/1357 2.080/4050 0.4920/4041 00/76 00/1303
2 5.3144/2720 1.370/5434 0.267/2672
1 3.4185/5488 0.6837/5442

If you'd prefer an image of this table for whatever reason, here's an .png file. This will always lead to a current version of the chart.

Here's a couple of notes about the unlikely numbers in the table.

  • The only person to miss the Top 8 with a 12-3-1 record (37 match points) was Kenny Oberg in Amsterdam 2010. He finished a distant ninth on breakers; Kai Budde was eighth.
  • There was also one person to miss the Top 8 from 12-2: Francesco Cipollesci at Pro Tour Nagoya 2011. Sorry, Francesco...
  • In seven of the 28 tournaments, nobody on 36 match points made the top 8. PT Kaladesh was the last time the door was closed on them. Twice three 12-4s made it (PTRTR, PTEMN).
  • The 16-0 box belongs to LSV, but do you remember who the other person was to have a chance at matching him? That would be Stanislav Cifka at PT Return to Ravnica. Kelvin Chew beat him in the final round to relegate him to the 15-1 box. I was surprised that there was also only one inhabitant of the 14-2 box; I guess if you don't have a good reason to dream-crush someone then at there's no need to play the last round out with a 14-1 or 13-2 record. The lone person to achieve that record was Chris Fennell at PT Amonkhet, who (I believe) played the last round for team series reasons as he was paired against Musashi's Ken Yukuhiro.
  • I find it pretty amazing that nobody has flamed out from 10-0 yet. I guess it's difficult to make it to 10-0, as there's often only one or two undefeated players after day one to begin with. Plus your tiebreakers at 10-0 will be good enough to make it in at 12-4 when someone on 36 points is admitted. But you can go 1-5 or 0-6 from that position and miss, can't you? My guess is that the 100 in that box should actually be in the 90s somewhere, and over time it will decrease a bit.
  • One person has come back from 1-3 to make the top 8: Alexander Hayne at PT Avacyn Restored. It was... a miracle. (Sorry, had to.)
  • The only recovery from 1-2-1 was Eduardo Sajgalik at PT Return to Ravnica.
  • While I'm sure many people have run off five straight constructed wins from 7-4, the only one of those to wind up 12-4 and make the top 8 was Noah Swartz at PT San Jose 2010. It helped that Noah started 7-0, so his tiebreakers were as good as they could have been.
  • Noah is also one of the answers to the following trivia question: which players have made top 8 of a PT despite going 0-3 in a draft in that PT? There are four total; the other three people to do this are Brian Kibler (PT Austin 2009), Naoki Nakada (PT Paris 2011), and Jiachen Tao (PT Oath of the Gatewatch). They all 0-3'd the draft to start day two. Note that the 0-3 box in the table is a flat zero, but that doesn't tell the whole story about drafts on day one because the draft used to be rounds 6-8. Still, it is true that no one has ever 0-3'd their day one draft and made top 8.
  • Last one for now: the seven people to come back from 0-2 are, in chronological order,
    1. Paulo Vitor Damo da Rosa (PT San Jose 2010),
    2. Alexander Hayne (PT Avacyn Restored),
    3. Kamiel Cornelissen (PT Theros),
    4. Anssi Alkio (PT Born of the Gods),
    5. Yuuki Ichikawa (PT Magic 2015),
    6. Ricky Chin (PT Battle for Zendikar), and
    7. Joey Manner (PT Kaladesh).
    You're not out until you're out! (Or apparently until you're 0-3.)

2008 and 2009 were integrated into the site yesterday. This update added 48 tournaments and around 140,000 matches. The site is big: 363 tournaments and almost 1.7 million matches in total now. The curating process is getting faster, though I expect that the scraping process will compensate by getting more difficult. So far I've been able to recover every round except for GP Costa Rica 2012, round 2. I bet in the next two years we'll come across a GP whose data is bad enough that we'll have to let a few rounds go.

Just in time for PT Kyoto 2017 I've added the previous time the Tour has stopped in Kansai. So it's time for a pop quiz: Who won PT Kyoto 2009?

I knew going into the project that there are people in different parts of the world that have the same name, but I didn't appreciate the problem of two people being in the same part of the world, ten years apart, with the same name. That is, until I had to try to reconcile results from 2008 with results from 2017. I'm doing my best, but stuff slips through the cracks. Your help in correcting the data is always dearly appreciated.

Late last week I went through every tournament since 2010 and determined the format played in each round. Often we get asked for information about Elo by format, and I want to discuss the issues with those ratings today.

There are two main problems that are difficult to overcome, both stemming from small sample sizes. The first is a global problem: very few people have played enough matches in any given format for the ratings to mean much. Like I said in the previous post, it takes around 125 matches before the ratings settle down. 125 matches is a lot at this level. Only 319 people have played 125 matches of standard since 1/1/10, and standard is the most played format. Of the ~148,000 people in the database, only 14.9% of them have played in even five tournaments total, and of those only 2.8% have played in five standard tournaments. With only a couple of tournaments under your belt, your rating is basically determined by your record; the extra couple of points you gain/lose from playing a better/worse rated player haven't accumulated to anything significant yet. So you might as well just track your record or your win percentage.

This leads to another important point: there are basically no second-order effects because very few opponents have played enough matches to have reached a stable rating. Elo won't know whether to appropriately reward or punish your results because it won't have an accurate measure of your opponents' skill. Because of this, the ratings for even the people who have played a lot need to be taken with a grain of salt. Whereas in most GPs pros are playing against people with byes and so are playing people who have played a least a few GPs before, if we limit it to, say, modern matches only, then the otherwise-experienced opponents by and large don't have enough matches to have a stable rating, and so the number of points on the line in each match may be way out of whack. (Cognoscenti may realize that a way to mitigate this issue is to use a different rating system, like Glicko, that reports a confidence interval instead of a single number. Someday I'd like to look into this, but today is not that day.)

The lack of second-order effects would also make calibrating K for each different format a nightmare, because the results will seem kind of random. The rating system will have a lot of "this win was very unexpected for someone rated 1550!" moments, whereas it's actually because the 1550 player should be rated 1800 but hasn't played enough legacy to have reached that yet.

To be fair, these problems are inherent in trying to rate people based on their results in premier tournaments: 80% of people here have played in three tournaments or less. The reasons for this are myriad; I'll leave it as an exercise to imagine as many as you can. But these problems are amplified much further if we compound the problem by limiting the data available to be based only on results in one format. With Magic Online, in contrast, every match you play is counted toward your rating, so you could quickly pile up hundreds of data points. If I had the information to create by-format FNM Elo or PPTQ Elo, those ratings would have a much better correlation with your skill, compared to the vagaries of the one modern GP that happened to be in your time zone when you had a tier 2 deck built.

Having said all that, we still don't have any plans of integrating by-format Elo ratings into the site in the near term. I just don't think they tell an interesting story. If you want to compare people, it's probably better to do it by some other metric, like win percentage. To that end, you can now find on the stats hub a leaderboard for win percentage by format. I'll keep this updated after each new tournament. I built it from my local copy of the database using Python, so it has its limitations. Still, if your goal was to confirm that Brad Nelson is good at standard or that Joe Lossett is good at legacy, I think it will be satisfactory.

As a super-special, one-time-only, no-plans-on-doing-this-again-soon, thanks-for-reading-this-far kind of thing, I also ran the numbers for by-format Elo. View them in light of all the caveats I've laid out.

As promised, I've got some information about the project to recalibrate our K constant today. K is a parameter in the Elo rating system that determines how many points each match is worth. Right now ours is set to K=36, which means that the two participants in a match bring 36 points to the table for each match. If the players have exactly the same rating, then both players "ante up" 18 points. Otherwise the higher-rated player will risk some P (greater than 18) of their points on the match, with the lower-rated player contributing 36 - P points of theirs. As K increases, each match is worth more points, so the ratings become more volatile. This causes there to be less information in the ratings, since the effects of matches from long ago are drowned out when every match is causing a big swing. With the current value of K, the ratings stop being sensitive to a single old match after about 45 matches, and the effect of an entire tournament is blunted once we reach about 125 matches.

In short, my plan for recalibrating K was to adjust the value of the constant, rate everyone, and then go back and examine everyone's ratings at the time of each of their matches. The Elo formula predicts certain values for those win percentages, like the higher-rated player should win 60% of the time if the ratings disparity is exactly 200 points. So we should try to pick the rating scheme that causes the data to best fit the model: this maximizes the meaning that the ratings have.

I tried a bunch of different constants and a couple of different rating schemes. The results are summarized in this Google spreadsheet. I encourage you to take look at some of the alternative models that I examined there. Here's a look at how the current system (every match is K=36) is doing. The horizontal axis is the rating discrepancy between the two participants in a match, with the win percentage of the higher-rated player on the vertical axis. Honestly, of all the rating schemes, this one seems to do the best when it takes every match into consideration. This shouldn't be too amazing; I chose K=36 at the beginning of the project because it seemed like the best choice for K at the time. Keep in mind that there are very few matches between players that have a 400+-point rating disparity, so the tail begins to wobble due to the effects of a small sample size.

Here's my concern. This is the same graph again, but where instead of looking at every match, we only look at matches between people who have already completed 25 matches. Your rating isn't necessarily correct at first, so this gives people a "provisional period" to get closer to their actual rating before we start counting their matches as being useful data toward the calibration process. In these matches between "veterans" we're consistently falling short of the target line. This is what initially made me say a few months ago that K=36 is too large. It leads to people achieving ratings higher than they're "supposed" to have, and so the higher-rated player loses more often than Elo is predicting. This undershooting effect is less pronounced with K=27 or K=30, as you can see on the spreadsheet.

This probably should lead you to propose a system where matches between veterans are played at a lower K than matches where one of the participants still has a "provisional" rating. I examined those sorts of rating schemes in the spreadsheet too. Do any of them look appreciably better than the pictures in this blog post of a flat K=36? Maybe a K=36 / veterans at K=27 split is a tiny bit better, but I don't think it's a big enough improvement to warrant a switch.

Also I should point out here that making matches between veterans have a lower K-value does something very counterintuitive: it nerfs the impact of Pro Tour events, where essentially everyone is a veteran. This is the opposite of how most people seem to think it should go — in fact I've had multiple people tell me that it's blasphemous that we're not using a higher value of K for Pro Tour events! It is true that in the old DCI rating scheme, Pro Tours had a value of K that was 1.2 times the value of K for Grand Prix. But the DCI rating scheme was also supposed to be a world ranking of sorts. It needed to be built into the system that winning the Pro Tour would be a guaranteed massive boost to your rating. The Elo project is, as we've said, for entertainment purposes only, so I don't lose sleep at night over this. There is an effect that increases the value of a deep run at a Pro Tour: the average rating in the room is much higher, so you see many more matches worth 18 points to each player and fewer of the 6/30 variety. For a player with a high rating, a Pro Tour is the best chance to push their rating up even higher. (It seems that once you reach a rating of around 2100, going 8-4 at a GP — that is, 11-4 with three byes — might not even be enough to tread water. On the other hand many people in the same range made some progress at PT Amonkhet going 11-5 or 10-6.)

As K varies, all of the different rating schemes have trouble at the high end, independent of the small sample size issues. The underlying shape each graph wants to make has more of a bend to it than the red graph has. I think this reflects the fact that win percentages over 70% aren't seen at this level of play. Less than thirty people who have played 200 matches have a 65% lifetime win percentage. So it's possible that the right way to improve the ratings on the site is to change the Elo model itself by lessening the expectation once we get over a 300 point disparity or so.

Let me offer a counterargument to the previous paragraph. (This either goes to show that I've thought about this all too much or that it's a really complicated problem.) Most of the matches with really large rating disparity involve someone who's been on a heater and has a 2000+ rating playing against someone with a rating in the 1500-1600 range. A 1600 rated player is still pretty good; that's someone who's at or slightly above .500 in their premier play career. I think we'd expect to see a better performance from the higher-rated player in matches with a huge disparity if we could get some matches between, like, 1700s and 1200s. But these by and large do not exist! There are two reasons: (1) people with 1700+ ratings typically have two byes, and so avoid a lot of the weaker players who have no byes, and (2) people whose actual skill would lead them to have a rating of 1200 or below tend not to play enough matches to reach their actual rating in the first place, because there's no incentive to stay in the tournament after you've lost a bunch. For a player like Owen Turtenwald who deserves a rating in the 2000s which means "I 11-4 every GP," he'll naturally arrive at that rating by 11-4'ing every GP. But if you deserve a rating that says "I basically 2-7 every day one," getting the total number of matches needed to arrive in the ballpark of that rating is very difficult.

This was a 1250-word way of saying that I'm not going to make any adjustments at the moment. There are two other factors that have to get weighed against making any changes: continuity and simplicity. I'd prefer not to change the scale in such a way that old blog posts, graphics, Reddit threads, etc., stop corresponding to what's on the site. And I'd also prefer for the rating system to be something that can be explained in one paragraph, if a two-paragraph or five-page mathematics paper version doesn't seem to lead to any real improvement. This isn't to say I'm done looking for better options (the idea of tweaking the underlying model is intriguing!) but for now it's back to the curating process for 2008-09.

You may be able to infer from this blog post that I could talk about this endlessly; if so you would be correct. I'd love to hear any thoughts or suggestions you might have about this topic! Send them to mtgeloproject at gmail dot com.

Since we just had two additions to this table in the last couple of weeks, it feels like a good time to post it here. This is a list of everyone, since 2008, who has won a GP without losing a match. No one has ever actually won every match they were paired for, though this is largely a matter of deciding how you count intentional draws.

Tsumura, Kenji12-0-1-1gpkl06
Saito, Tomoharu11-0-1-2gpsin09
Vidugiris, Gaudenis12-0-1-2gpden11
Shiels, David11-0-3-1gpdal11
Parker, Richard16-0-0-1gplil12
Duke, Reid13-0-0-2gpnas12
Darras, Alexandre12-0-1-2gpman12
MacMurdo, Walker12-0-3-1gpauc12
Lanthier, Dan12-0-2-2gpvan15
Lipp, Scott13-0-1-2gpsyd16
Nelson, Brad13-0-0-2gpoma17
Saporito, Thiago13-0-1-1gplv17-lim
Locke, Steve16-0-0-2gpmin17

We have raw 2008-09 data now; it's going to take a couple of weeks to curate it, but it's good enough to decide whether the winner lost at some point during the tournament or not. The table only lists the record in rounds played, i.e., byes are ignored. This is why the win totals might look low at first blush. Richard Parker only had one bye in GP Lille, so he got an extra couple of matches in at the beginning. This is an insane accomplishment, but it is actually not the record for wins in a single GP: that distinction goes to Brock Parker, who won GP Pittsburgh 2013 with zero byes, going 17-2-0-0 in the event. He was helped by the fact that there was a tenth round of sealed deck played on day one. Given that all current Grand Prix (and almost all older ones) involve fifteen or fewer rounds of Swiss, it seems very likely that no one else has managed 17 wins in one GP.

2010 and 2011 were integrated into the system a couple days ago. This update added 46 tournaments and just short of 170,000 matches. This pushes us over the 300 tournament mark. There's 1.49 million matches of Magic catalogued here. Based on the data from a couple posts ago, I think we're at about 48.6% of all tournaments and maybe 55% of what we can possibly get. I'm hoping to stay on the pace of two years every two months like I've done so far, but we do have some behind the scenes work to do as well, so we shall see. The next two years total 48 tournaments, but the tournaments are also getting smaller, so my fingers are crossed that the total amount of stuff to swim through will decrease. Unsurprisingly I'm getting faster and faster at the curating process, but there's a lot of name-reformatting that has to happen between scraping the raw data and adding it to the site. For 2010-11, the raw data had 17784 unfamiliar names and the curating process pared that down to 11521 new people. Naturally I missed some, and also some entries have data from multiple people in them. But this is what's gained from working on the data some.

At the moment K = 36 still. I thought K = 30 or K = 32 would give more predictive power, but there's a baseline level of noise that no value of K seems to mitigate. I thought this comes from the fact that people near 1500 are often grossly misrated, but the effect seems to persist even when we only look at people who have already played 25 matches (admittedly a much smaller data set). I'm still messing around with variable-K schemes and other slightly more complicated rating methods, but I'm not going to change anything until I get something that I can confidently say fits the data better than what we have now. Otherwise there's no reason not to prefer continuity. I put some data together for this update, but I think I can present it better than I am right now, so I'm going to take a few extra days to improve it before I share.

We still don't have a big problem with inactive players: it's true that 22642/146738 people in the database haven't played since before 1/1/13, but only 42(!) of those 22642 people have a rating of 1700+. (The highest rated such player is Ian Duke, at 1902.) I haven't figured out exactly what should happen if people play in an event after a long layoff, but isn't a big problem either -- only about 1350 people have had a gap of more than 200 tournaments (4-5 years) in their histories, and only 26 of those have a rating over 1700 currently. Probably there should be some sort of regression to the mean or something, but I bet doing nothing is fine too. As an extreme example, Corey Baumeister made the finals of GP Miami 2015 after a 3+ year layoff. We'll see what happens as we keep going back in time. I just don't want to discover that a spot in the top 30 is frozen on someone who stopped playing in 2005. At the least we'll set it up so that the rankings calculation only take into account people who have played within the last couple of years. There are people like Xu Su out there who don't play much but are very good when they play. (Finals of consecutive events, two years apart!) (But I assume real life made him decline the PT invites.) I don't want to cut those sorts of people out due to a hyper-focus on the week-to-week grind of the tour.

I believe that GP Montreal last weekend marks the fifteenth occurrence since 1/1/2010 that tables 1, 2, 3, and 4 at an event have chosen to ID. In three of those, it was a clean cut to the top eight: the eight people who drew were guaranteed to have more match points than the rest of the field. In the other twelve instances it came down to tiebreakers and/or the results of pairdowns. Here's how the eight IDers finished, by event.

gpoma17clean cut
gpmon171 2 3 4 5 6 7 9
gpvan151 2 3 4 5 6 7 8
gpman152 3 4 6 7 8 9 10
gpba141 2 3 4 5 6 7 8
gpabq132 3 4 5 6 7 8 9
gpkc131 2 3 4 5 6 7 9
ptdgmclean cut
gpsin131 2 3 4 5 7 8 9
gptai12clean cut
gpmc121 2 3 4 5 6 7 9
gphir11clean cut
gpdal112 3 4 5 6 7 8 9
ptams101 2 3 4 5 6 7 8
gpbru102 3 4 5 6 7 8 9
gpkl101 2 3 5 6 7 8 10
pthon09clean cut
gpbrus081 4 5 6 7 8 9 10
gpnj042 3 4 5 6 7 8 9
gpfuk021 2 3 4 5 6 7 8

Only in three(!) of the twelve tournaments where there was suspense did the eight IDers make the top eight. Be careful when drawing, people!

We've upgraded some of the pages at the left. The leaders page has been expanded and is more sortable than before. You can look arrange the table by rating as usual, or now by record or winning percentage by tournament type. It also expands to the top 150; this change needed to happen at some point because the people near the top are all very close to each other, so it was pretty capricious who happened to appear when we only had a top thirty. It felt like an overload to see the whole top 150 by default on mobile though, so we kept the shorter option too. At the moment, 65 PT wins will get you 150th place. I wonder how far up that number will go once we're done adding tournaments to the beginning of the data set.

To make room for this, we've moved the histogram/percentiles table to a new stats hub. There are a couple of other widgets there, linking to pages that document some of the ancillary things we've blogged or tweeted about. You can find the table of unintentional draw streaks there, for instance. The plan is to add more of these as we go along. I want to add a Weeks at #1 page that lists who's been the highest-rated player and for how many weeks, for instance. If you have other suggestions for stats we can track, we'd love to hear them!

2012 has arrived! I finished curating the data early this week and set aside the time today to upload it and fix all the errors I uncovered while curating. This update adds 46 tournaments and around 170,000 matches. Some thoughts about our progress and what's yet to happen:

  1. First, K=36 is too big. We started the site with about 105 tournaments' (two years') worth of results, and a common refrain was that the data set was too small to draw many conclusions out of it. This was a fact. It's bigger now but of course there's still plenty more that can be added. I calibrated the K constant as well as I could given the data I had at the time, but many people were not close to their "accurate" rating and that had an effect on the process. When I chose K=36, it felt like any value of K in the mid-30s was equally valid, but tests now seem to suggest that a better choice is in the 29-32 range. I'm not going to mess with it right now, but when I add the next batch of old tournaments expect an adjustment. Changing K will cause all the ratings to move toward the middle a bit, but adding a block of tournaments to the back end of the timeline causes them to jump up. So we might as well try to save this change for when the ratings were going to experience a discontinuity anyway. Right now I have no plans to use differing values of K for different tournament types, but when this next calibration occurs I'll investigate whether some other combination of constants leads to having more predictive power.

  2. How much of the usable data is here? We currently have 231 Grand Prix and 24 Pro Tours/World Championships. The tournaments get sparser going back in time: 2012 had 46 tournaments itself, while 2010 and 2011 together are 46 tournaments and 2008 and 2009 together are 48 tournaments. My plan is to tackle those two batches next.

    From the beginning of time until where our data stops, there have been 375 tournaments (301 GPs, 74 individual PTs + Worlds), but I admit I'll be thrilled if any of the pre-2000 tournaments have good enough records kept that we can add them to the project. @Quick_MTGStat says in his PT Elo spreadsheet that the PT data breaks down before PT Los Angeles 1999. That would suggest the beginning 75 tournaments might be out of reach, leaving about 300 left to obtain. In any case, I think right now we have 40% of everything and 46% of what we're going to get.

  3. At some point we'll need to address the issue of ratings decay. This topic has come up a couple of times recently because LSV has sat at the #1 spot for most of the year while on the coverage team. To be clear, I don't think that LSV's position would be affected by a decay process. His last tournament was seven months ago. If decay set in on that time scale, players in places that don't get many tournaments per year like Australia and Latin America would feel the brunt of it. Hell, I personally had a seven-month gap between tournaments because there wasn't one within a nine-hour drive of my apartment. If you felt in note (1) like the data set was too small when it only had two years' worth of tournaments, then you have to feel in this note like the right time scale before being considered inactive is longer than that. Right now I'm thinking three years or so is a reasonable starting point, but given that we're only at five years in the data set, there's not too many people that have experienced a three-year layoff. So for that reason I'm not going to worry about this topic quite yet.

    There are other aspects of adjustments to the ratings that should be addressed, too. As time goes on, more people will have established a high rating, and so it becomes easier over time to reach higher and higher peaks. This happens largely because there is a constant flow of people entering the system, doing poorly in one tournament, and not playing again. If they go 2-3 and have their rating fall from 1500 to 1482, then they've just created 18 points out of thin air that the rest of us can fight over. These points will slowly concentrate themselves among veteran players, and in doing so the number of high-rated players will climb. This makes it challenging to compare peaks across eras, since hitting 2000 nowadays will be easier to accomplish than hitting 2000 several years ago. There's a relatively simple fix to this: "tax" wins by a very small amount to pull points out of the system. There's some discussion of how something similar to this is implemented in the European Go Database (where apparently they have the opposite problem?) and at some point I want to think about how to implement something along those lines. They apparently rake 1.6% of the points out of each match. That feels a little high to me, but the idea seems promising. Should all matches be subject to a tax? Only if the winner has a 1700+ rating? Should the tax increase as your rating does? These are all knobs I'll work on turning.

  4. There are some systematic inaccuracies in the data that I want to correct before I begin scraping 2010 and 2011.

    • Recent tournaments sometimes will use the code "Loss L-L" to represent a double no-show. I only recently noticed this; right now we're incorrectly displaying these matches all as wins for player B. I think in most of these cases, no Magic occured. I'll work on stripping them out of the data.
    • I worked on collecting all the final standings pages to help me curate 2012. This was something I hadn't had access to before (we were only using the results pages for each round). In some tournaments, the number of people in the final standings and the number of people who appear in one of the results pages is way off. I think this is because some of the tournaments paired everyone who registered, then dropped the no-shows after they didn't appear for a match. I should be able to purge those too.
    • I also want to build in an "internal consistency" check — there are some results that are backwards, and if I compared the "apparent" match point total based on the results pages to the official tally on the final standings I might be able to fix some of those. I think the judges who enter the results from the match slips have around a 99.95% accuracy rate, but even that corresponds to one or two matches per GP being put in incorrectly.
  5. GP Costa Rica 2012, round 2 doesn't seem to exist. The results page and the pairings page are both copies of the round two standings. So that round is just not present. I looked for a bit on the Wayback Machine to try to find it, but I think it was originally uploaded wrong five years ago. If you played in this round and can tell me your opponent and the match result (this is available by logging into and checking your history), I'd love to include whatever we can get from it. I suspect as we go back further we'll find more examples of lost rounds or entire tournaments, and I want to add a page to the links bar that keeps track of what results we don't have. It's kind of like the Retrosheet project, but for Magic.

  6. There are other site improvements we're going to work on making. We want to expand the leaderboard to give people better access to ways to look at the data. We also will add a stats hub to display some of the results of playing with the dataset. I came up with a few such things this week in a Twitter thread; they should be on the site somewhere. There are some less visible upgrades to how things work behind the scenes that will be undertaken soon too. I think that's the pattern you should expect for the next couple of months: a couple of weeks of site improvments and fulfilling feature requests, then a bout of data scraping and curating. Having said that, if there are features you'd like to see on the site, or questions you've wanted answers to that we can help with, drop us a line at mtgeloproject at gmail dot com. Your feedback is always welcome!

Who among your friends has the highest rating? The highest peak? What's the average rating of your playtest group? Did you know that every member of Team Puzzle Quest has at some point in our dataset had the highest rating on the team, and also at some point had the lowest rating on the team? I have to confess, I'm kind of obsessed with playing around with the custom team page that we've just added to the site. Up to twelve players can be added to your team, and bookmarking the url lets you keep track of the progress and combined stats of a group of players all on one page. We would love any suggestions for improving this feature: are there stats you want to see that we left out? Is the limit of twelve too small? (That number was chosen somewhat arbitrarily, but we found that the graph looks pretty bad as you get up toward a dozen players.) Please let us know! This whole feature came about because we got an email a couple of months ago asking about a 'multisearch' and we thought it was a great idea. I love getting feature requests like that, so if you have any ideas, please do be in touch.

A couple of quick notes and tips: hovering the mouse on the peak score of any player in the table will display the event and round of that peak as alt-text. If the graph is too muddied to read well, unchecking the boxes next to the player names along the bottom will toggle their graphs. The win percentages are computed as wins/results; in other words draws, intentional or otherwise, don't factor in.

Of course on the theme of teams, we're also excited for the beginning of the Pro Tour Team Series. On the pro teams page you can view all thirty-two teams, sorted by average, median, best rating, or worst rating. Clicking on a team name drops down the list of members and a link to the custom team page for that team, where you can view the graph and aggregate statistics. We've also added a team button to the individual page of any member of a pro team, linking to the other members and the team page. Musashi narrowly comes out on top of Channelfireball – Ice in both average and median, a fact which seemed to impress one of its members:

(My rough translation: "[Musashi] leads not only in median but also in average, huh? It's expected that the two aces would be really strong; but for everyone on [Ice] to be over 2000 is incredible.") Of course, we can't be held responsible for any discord that might be sown among teammates by the statistics we provide...

Meanwhile in the arguably more important race, I believe that Almost Finnished edges out Conflagreece for most groan-worthy team name (although it's close!); and I have to give a shout-out to team Lingering Souls, which I think is my overall favorite name. ^_^

We've incorporated all the rest of 2013 today in one big batch. That's 27 tournaments added at once, including Pro Tours Gatecrash and Dragon's Maze. Having several weeks off from new tournaments let me work on 2013 as a group which expedited the curating process. I might try to do something similar in the weeks around the release of Amonkhet. There's about 1.014 million matches in the database now, and in a couple of weeks we'll hit tournament number 200. I believe at the moment there have been 625 relevant tournaments total, so roughly a third now are available for your perusal. (Relevant being GPs, PTs, and Worlds. I'm thinking about tackling national championships too though.) The older tournaments are (a) smaller on average and (b) may not have usable records kept, so I'm guessing it's actually more like 40% to 45% of the possible database that's up now.

Our data set is starting to reach a size where you can see peaks from different "eras" emerging. With each passing match an old result has a smaller and smaller effect on your current rating, so adding old data isn't useful if all you care about is what the current ratings are. But ratings as of several years ago are closer to the front of the dataset, so new additions to the front will still influence them. Huey hit a peak upon reaching 11-0 at Pro Tour Magic 2015 which has grown to become the fourth-highest mark ever attained (2305 at the time of writing). Other high points, like Sam Black's high at GP Toronto 2013, will be interesting to track once tournaments from 2012 start getting added.

Despite what I said in the previous paragraph, if you're keeping careful attention you may have noticed that there's been a "rising tide" effect as I've added more results to the beginning of the data set. For example, LSV's rating was 2227 right when the World Championships were done, but since I've added a year to the front end of the data his rating has risen to 2258. This is almost entirely due to indirect contributions, rather than direct ones. Adding these tournaments has increased the rating of LSV's opponents, and so matches closer to the present look more impressive than they did previously. All boats rise due to this effect — even my rating went up about three points from the beginning of working on this cluster of tournaments. We'll continue to see this as we continue going back in time, since most people don't have a transcript long enough to make early tournaments irrelevant. There are ways to correct for this inflation, but I think it's a problem to tackle at a later date.

In the last update, I said that I'd update the table of draw streaks each time I completed a year. The table in the entry below is now current. We found the beginning of some long streaks, like Christian Calcano's and Joe Demestrio's. Owen said on Twitter that he can't remember taking a draw, so if there is one it'll be a while before we come across it. How far back does LSV's streak go? In a little bit we'll peel back another layer.

I said I wasn't going to scrape for a little while, but I do have some data to share today. After a comment Reid made on Twitter, I got curious to figure out how long someone has gone without an unintentional draw. I've learned that three four years is too short a scale for this, as there are several people who don't have even one in our database yet. But, for now, here are the candidates. I'll come back to this question each time I tack a year onto the back of the data set.

This comes with the usual caveat that sometimes it's not super clear when a draw in a late round is or isn't intentional, but I've done the best I could. The streaks here do not include unintentional draws. By default, all streaks of 250 275 matches or more, active or not, are displayed. If you'd prefer, you can filter the table to see only active streaks. After you've done that you can restore the default view.

Update (5/1/17): This table now has its own page accessible via the stats hub and is updated with each tournament.

A couple of days ago I added Grand Prix Milwaukee 2016 to the site, meaning that all of 2014, 2015, and 2016 is here now. There's more than 820,000 matches in the database spanning 149 tournaments. Grading finals and completing a bit of math research is above scraping more tournaments for me at the moment, but I'll get in one more batch of corrections in a couple of days. My goal is to get to PT Theros (10 more tournaments to go) by the end of the year.

The World Magic Cup begins later this week! I've spent the last couple of days looking at the list of participants, trying to match them to entries in our database. Here's the fruits of that tree. Some teams have players that haven't played in a Grand Prix or Pro Tour in the last three years, so they've been assigned the starting value of 1500 (colored gray in the table).

You can sort this table by average, median, top rated player, lowest rated player, or alphabetically. If you'd prefer to limit the sort to only the teams with four rated players, here's the average and median sort with those teams filtered to the top. Mouseover a rating to see the name associated with it, or click on a country to make the names appear. The bold rating is the team captain. Best of luck to all participating teams!

I've been keeping the database updated with new GPs as they've occured, and with each update I've managed to get one or two more old ones into the system. Today's update includes the just-finished GP Dallas (congratulations to Kevin Mackie and Skred!) as well as the next Pro Tour back in time, Born of the Gods. (That means it's time for another pop quiz: who won PTBNG?)

This last month has seen the rise and fall of Shota Yasooka — he hit a peak of 2263 by winning the PT, then spent a hundred points going 2-4 in Malaysia. At the moment no one is above the "LSV line".

We've gotten a number of good feature requests from the community in the last couple of weeks. I don't think we'll have time to add much to the site itself until winter break, but I look forward to implementing some of them. Until then I'll continue trying to bolster the database. We have six more GPs left to reach the beginning of 2014 and seventeen to go to reach the previous Pro Tour. Sounds like a lot, but we're well over a hundred tournaments now, so seventeen more doesn't sound that daunting any more.

New additions today: GP Beijing 2014, GP Atlanta 2016, and GP London 2016. Lukas Blohon went 2-3-1 drafting in London and it cost him eighty points! The perils of a 2300+ ranking...

Best of luck to everyone participating in the Pro Tour!

I added five more GPs to the beginning of the timeline today, besides tidying up some high-probability duplicate entries. Today's innovation was to check every instance of a last name being shared by exactly two entries, to search for nicknames and typos in the first name. I'll do the reverse when I update next and look for mistakes in last names. Eight more GPs to go before I reach the next Pro Tour...

Refreshed the database again. Highlights from this update:

  1. Five new tournaments were added: Grands Prix Chicago, Moscow, Manchester, and Atlanta 2014, and Pro Tour Journey into Nyx. (Pop quiz: who won PTJOU?) Some things I was forced to ponder: why was Moscow only 14 rounds? Why are the two halves of round five of GP Chicago the same (including the standings, to ensure maximum difficulty in reconstructing the results)? Why have a GP the week after the Pro Tour in the same city?
  2. I discovered that round 12 and round 13 of GP Porto Alegre 2015 were copies of each other. I reconstructed round 13 and we got this fixed. If you happen to notice a player playing the same opponent in two rounds in a row, it might be another instance of this mistake. I'm pretty sure that there aren't any more instances of it in the database at the moment, but this problem may come up again in the future.
  3. I merged and split a few people who were/weren't the same thanks to tips we got from the community. Thanks guys! Keep it coming. I also fixed some Mike/Michael, Andy/Andrew, Dave/David, and Tony/Anthony mistakes. I didn't really know where to look to see if people from Moscow were the same as other people — I didn't realize when we started the project that we'd need to be knowledgeable in Russian transliteration conventions. I'm sure there are some entries that need to be combined/separated that so far have gone undetected. Before I add more tournaments I'm going to work on cleaning up what's here a bit.
  4. We're now rating unintentional draws. For information on what this entails, check out the entry in the FAQ. I was originally hesitant to do this because I was afraid that I wouldn't be able to tell the difference between intentional draws and unintentional ones. But sometimes it's not hard to tell if it's intentional: Most intentional draws are reported as "Draw D-D", "Draw 0-0-3", or "Draw 0-0-0", though sometimes 0-0-1 or just 0-0. My general rule of thumb was to interpret any of these notations as representing an intentional draw if I could find any remotely logical reason why the players were incentivized to draw. This includes making top 8, nabbing an extra Pro Point at a PT, or even the rare round nine matchup of 6-0-2s who could ID to make day two. Any sort of 0-0-x draw that ended with one or both players in the money I treated as an ID. Now there were some random apparent IDs in early rounds scattered throughout the tournaments. Sometimes it's easy to confirm that these were unintentional: But I'm afraid some of these were people convincing their opponent to skip a round and get lunch. There are 17119 draws in the database. Probably about 1000 of them should be intentional, and I've marked about 800 of them. My guess is that about 200 intentional draws are inaccurate. Draws don't have a big effect on the rating, either, so this isn't something to lose sleep over. But doing it the other way, with no draws rated, about 16000 matches were being tallied incorrectly. If the goal is to minimize wrong results, this does represent progress.

    Note that for rounds that I had to reconstruct, like Porto Alegre round 13 from pont #2 above, all results are styled as 0-0, whether it's a win, loss, or draw. So if you want to point out a match to me that's a potential ID, you need more evidence than what the site is displaying.

The numbers in the previous analysis may be slightly off now, because I've just added four GPs to the beginning of the timeline: Grands Prix Boston-Worcester, Taipei, Milan, and Washington DC 2014. I also corrected about a hundred errors, some of which came from the community (thanks everyone! Keep it coming!) and some of which I stumbled across on my own as I was adding new people in. My goal is to get back to Pro Tour Journey Into Nyx for the update next week, so four more GPs to go.

Here's an update to the table from the last enty, with the tournament complete. The deltas are all based on the positions entering the tournament. Congratulations to the world champion, Brian Braun-Duin!

rank Δrankratingname record
-12332Lukas Blohon9-5
▲ 522288Brian Braun-Duin12-3-1
-32211Luis Scott-Vargas9-5
▲ 742170Oliver Tiu9-5-1
▲ 262139Seth Manfield8-6
▲ 2272130Marcio Carvalho10-5-1
▼ 482128Mike Sigrist7-7
▼ 792123Owen Turtenwald6-8
▼ 8142092Yuuya Watanabe6-8
▲ 28162077Shota Yasooka9-6
▼ 1202070Reid Duke7-7
▲ 1222048Brad Nelson7-7
▲ 1242045Joel Larsson7-7
▼ 14282031Ondrej Strasky6-8
▲ 12292022Steve Rubin7-6-1
▲ 4421997Paulo Vitor Damo da Rosa7-7
▼ 25451988Sam Pardee5-9
▲ 69531979Jiachen Tao8-6
▲ 142831945Thiago Saporito8-6
▼ 1531741878Martin Muller3-11
▼ 1033721810Kazuyuki Takimura5-9
▼ 1733801807Andrea Mengucci5-9
▼ 2054241795Ryoichi Tamada4-10
▼ 1210671716Niels Noorlander5-9

Not shockingly the rating and ranking of the people who went 7-7 is very similar to their starting values. For instance, Reid's 7-7 changed his rating by 1.97 points (from 2068.35 to 2070.32) and his ranking from #19 to #20. Less obviously, Elo was not impressed by Blohon going 9-5. Given his schedule (i.e., that he played a player ranked 2066, then a player rated 2026, etc.), a correctly-rated 2320 should have won about 8.65 of their matches. So this was a slight overperformance in the system's eyes, hence a slight improvement to his record — he ascended from 2320 to 2332. Similarly, Niels performed shockingly close to expectation: his rating moved by only 0.75! He went from 1716.30 to 1715.65, a change small enough to be swallowed up by rounding. Again, the bulge of people in the low 1700s still meant that he was passed by twleve people.

Since the field is small I was able to add the results from today's matches into the system. The 6-1 records from Brian Braun-Duin and Marcio Carvalho were worth close to a hundred Elo points each! These big swings are possible because each player in the tournament has a comparably stratospheric rating, so each match is worth a lot to each participant. (In contrast, a typical Grand Prix for a player with a 2100 rating is kind of like a college football schedule: a smattering of titanic clashes interspersed with the Elo equivalent of FBS teams.) Here's a look at how each player's ranking has changed. I'll update again on Saturday night after the back half of the Swiss rounds is in the books.

rank Δrankratingname record
-12336Lukas Blohon5-2
▲ 522226Brian Braun-Duin6-1
▲ 132197Mike Sigrist5-2
▼ 142175Luis Scott-Vargas4-3
▲ 352166Seth Manfield5-2
▼ 572156Owen Turtenwald3-4
▼ 282122Yuuya Watanabe3-4
▲ 2092121Marcio Carvalho6-1
▼ 1122108Oliver Tiu3-3-1
▲ 4152097Reid Duke4-3
▼ 6202057Ondrej Strasky3-4
▼ 2222047Sam Pardee3-4
▼ 4272029Brad Nelson3-4
▲ 16282025Shota Yasooka4-3
▲ 17292016Paulo Vitor Damo da Rosa4-3
▼ 1421996Steve Rubin3-3-1
▼ 18431993Joel Larsson2-5
▼ 58791947Martin Muller1-6
▲ 33891937Jiachen Tao4-3
▲ 931321902Thiago Saporito4-3
▼ 62131858Andrea Mengucci3-4
▼ 823011828Ryoichi Tamada2-5
▼ 963651813Kazuyuki Takimura2-5
▼ 96220171670Niels Noorlander1-6

5-2 was about par for the course for Lukas, who maintains his incredible peak. It is certainly unsustainable, but I'm captivated to see how long he can continue to hold such a high rating. As Rebecca said in the post below, we shouldn't look at Lukas's high rating as an indication that he's that much more likely to win the tournament from this position. What Elo is picking up on is that his recent results (112-38 in his last 150 matches!) are consistent with the results of a real juggernaut.

And to be fair to Niels, Elo didn't punish him too much for his 1-6 day — it only cost him about 46 points off his rating. Given the Elo ratings of the people he played, the ratings only expect that a 1716 player would manage 2.4 match wins. For comparison, a 1500 player would only expect about 1.76, and going 1-6 against that slate would only cost the 1500 player about 27 points. (These 25-to-50 point adjustments very small. Remember our rankings are "elongated," so that a 25-point difference only corresponds to around 1.25% of win expectency in any given match.) The fact that Niels has a large ranking delta just has to do with the fact that there are way more high-1600s players than there are players in the 1800s and above, so he fell past a big pack of people.

A quick note/musing on the Worlds simulation Adam posted about below: With no disrespect to Lukas Blohon, it's obvious that in no realistic model is he 16% to win the tournament. So why did the simulation come out that way, and does it mean that the expected win percentages that our model is assuming are pretty far off?

Well, the short answer is no. When we look at all matches in our dataset between "veteran" players (three events or ten matches played), players whose rating is 195-205 points higher than their opponent have a 58.89% win percentage, which is very close to what the model expects. The problem is that at any given moment in time, the rating of any player who has just won a tournament or had a couple of deep runs is inflated a bit above their equilibrium point. When we take a snapshot of the current ratings, and then run a simulation forward only sixteen matches, the effect of that inflation is exaggerated.

There would probably be a more "polls-plus" way to try to simulate the outcome of a given tournament, adjusting for recent big swings in rating, but it would take (a) a larger dataset (ours is still relatively small for the moment), and (b) more time. For now, if you're using our site to help make your MTG Worlds fantasy draft picks (as I have been!), take the actual match/tournament history data seriously, and the win probabilities as entertainment. ^_^

The World Championship starts later this week. Here's links to the 24 people who will be competing for the trophy. The field has seven of the top eight (sorry Scott Lipp!) and half of the top thirty.

rankratingname win%top 4%swiss
12320Lukas Blohon16.55%46.38%8.747
22200Owen Turtenwald8.81%30.63%8.000
32172Luis Scott-Vargas7.50%27.47%7.833
42166Mike Sigrist7.25%26.77%7.796
62150Yuuya Watanabe6.54%24.98%7.699
72148Brian Braun-Duin6.44%24.72%7.685
82136Seth Manfield5.98%23.48%7.610
112113Oliver Tiu5.15%21.17%7.475
142093Ondrej Strasky4.54%19.20%7.351
192068Reid Duke3.82%17.06%7.208
202066Sam Pardee3.76%16.87%7.193
212050Martin Muller3.38%15.46%7.095
232049Brad Nelson3.37%15.45%7.090
252045Joel Larsson3.22%15.06%7.063
292028Marcio Carvalho2.90%13.85%6.964
411999Steve Rubin2.32%11.72%6.789
441992Shota Yasooka2.22%11.29%6.748
461990Paulo Vitor Damo da Rosa2.16%11.16%6.735
1221911Jiachen Tao1.14%6.93%6.259
2071862Andrea Mengucci0.75%4.97%5.958
2191856Ryoichi Tamada0.70%4.74%5.922
2251853Thiago Saporito0.68%4.67%5.903
2691839Kazuyuki Takimura0.61%4.24%5.817
10551716Niels Noorlander0.19%1.68%5.061

The right three columns were created by simulating the tournament one million times. They show the share of times that player wins the tournament or makes the top four, as well as his average number of wins in the Swiss portion of the tournament.

I can't claim to be 100% sure that I got the pairings algorithm correct, but I did my best. I assumed that the draft pods would be between the people in positions 1-8, 9-16, and 17-24 respectively, that the drafts are seated randomly and that pairings in the drafts are based on seat, and that you can't play someone in constructed that you've already played in constructed (disregarding format). The latter two things might not be totally accurate (I'm guessing that #1 and #2 are encouraged to play in the first round of the second draft?), but I think it's probably close enough for the numbers to be in the ballpark. The outcome of each match is decided by flipping a weighted coin whose weight is determined by the Elo win expectency scheme. The ratings are updated after each round, so Lukas Blohon doesn't necessarily enjoy a stacked deck throughout the simulation.

Mainly I was curious as to what Lukas's 120-point lead on the field in Elo translated to on a whole-tournament scale instead of an individual-match scale. It's rather sizeable. These numbers do highlight the limitations of Elo: I don't think that it's possible that Niels Noorlander's odds are really like one in 520 as the table suggests. In FiveThirtyEight parlance, this is much more of a now-cast than polls-plus. But if you want some food for thought while you fill out a fantasy draft, here you go.

Good luck to everyone participating!

As part of a huge update fixing many typos and other small inconsistencies with the dataset, I went through David Williams's twitter feed. Big-name players who tweet about their tournaments are invaluable. Keep it up, guys.

As part of this update I investigated: (a) every pair of names that were off by one character, (b) every person who played in two GPs that occured simultaneously, and (c) every entry that had a parenthetical nickname. This unearthed hundreds of pairs that I felt sure enough to combine, and some that needed to be split. As with any combining effort, there's going to be some false positives (entries that should not have been merged that were) and false negatives (names I should have merged but elected not to). Many of the false positives probably came from how I aggressively merged all "Yusuke" and "Ryusuke"-s into a corresponding "Yuusuke" and "Ryuusuke". But I think overall I improved the quality of the database by a sizeable margin. The update removed 486 duplicate entries.

A rundown of some arbitrary decisions that I have had to make in the last day or two...

  • To graph, or not to graph...? -- I'll be honest, I'm obsessed with the Elo graphs I managed to cobble together. They're so pretty! Someday I'll even change the default colors... Anyway, right now a graph only shows up for people who have played in at least five tournaments. We just felt that a choppy plot of three line segments didn't add much to what you can easily see scrolling through a small number of tables. However, if you've played in fewer than five tournaments but still want to see what your rating looks like in graphical form, you can just look at your head-to-head against anyone; the two-person graph always displays. (For example, that link shows you Adam vs. (my favorite player) Reid Duke. Adam's got some catching up to do.)
  • Do you really care who's ranked 700th?? -- Probably the thing we've been soul-searching about the most the past few days is how comprehensive a leaderboard to display. We toyed with having it go down to #250, but it looked clunky. As a compromise for now, we kept the leaderboard at just the top 30, but on individual pages anyone whose rating is at least 1700 will display a rank. Why 1700? Arbitrary, like I said. Right now there are a little more than a thousand players at 1700 or above. Beyond that, does a ranking actually mean anything? Personally I think that the percentile table (on the leaders page) gives the right amount of information for those of us who don't happen to be among the elite. But if there's demand for a more comprehensive leaderboard of some kind, we'll come back to it in the future.

I guess that's it for now. I'm also struggling with the ideal number of characters for the autocomplete on the search forms to begin making suggestions at, but that's not an incredibly exciting discussion. Meanwhile Adam continues to spend a crazy amount of time hunting down typos in the GP data. We're getting close to being in good enough shape to start showing the site off. ^_^

Rebecca is awesome. She went from "it'd be fun to add graphs but it seems daunting" to "maybe I'll mess around with flot a bit" to having a graph generated for everyone who has played in at least five events. All in the span of a few hours.

GP Orlando 2014 was quite a struggle. First, it turned out that two different rounds had no results, with round 4 (Temur) the most egregious. Where is everybody? The page is blank. Then I realized that Temur rounds 1 and 2 included all of the Jeskai matches, and so I had to weed out the duplicates. Meanwhile, there was a mistake in the file I compiled that was meant to insert the matches into our database, and compounding that, the routine we wrote to fix the mistake itself had a mistake in it. Ugh. Very glad that's over with.

We've built a shell that lets us query the database up into a fully-functioning site in the last few days. It's a good feeling to have something more tangible than Python readouts and php admin pages. At the moment there are 83 Grand Prix in the system, from Pittsburgh 2016 back to Los Angeles 2014. More will be added after we finish creating the other pages, and progress on that front will be mentioned here. This space will also be used for other forays into the data set as we explore it.