How a Jeopardy! Champion Remembers Everything

Roger Craig isn’t your average Joe.

He’s a machine learning, data science, and AI practitioner who combined his computer skills and an interest in quiz games into something extraordinarily unique:

He won more than half a million dollars on Jeopardy!, including a whopping $77,000 in a single day.

He did this not just with raw intelligence, but by creating a system to help him fashion himself into a game-show superstar.

In this edition of superorganizers Roger tells us:

How he used NLP to statistically analyze the Jeopardy! Archive
How he used spaced-repetition through Anki to help him memorize the vast of amounts of knowledge he needed to win
How he uses Polarized to help process and take notes on PDFs on his computer
How he builds up a library of bookmarks to help him create a diary of everything he’s ever consumed

I’m psyched to get to the meat of the interview. Let’s dive in!

How he got started studying to compete for Jeopardy!

Everything I did for Jeopardy! was very organic. I was always into quiz games, and I had played Quiz Bowl in college.

But I didn’t go into it wanting to do everything I ended up doing. Initially I wanted to figure out, “What is the percentage of Shakespeare questions on the show?” or “What is the percentage of US Presidents?”

So I built a system to do that, and I started sharing what I had done with some of my friends. And then one of them said, “ it would be great if you made a little game of solitaire where you could just play an episode.”

And it built organically from there.

How he studied for Jeopardy!

I started by downloading the Jeopardy Archive and using that to study for the show.

Essentially, it's too much information for someone to absorb, in a traditional way. So I had to bring order to that chaos. I did that by starting to categorize the clues using natural language processing or text mining.

At a basic level, it's download, scrape, normalize it, get it into some relational database, and then cluster everything.

Once you do that you can start to see what kinds of questions get asked on the show. Once you understand the statistical distribution of questions on the show, the next question is: where are your strengths and weaknesses?

So I built a front end where I could start to label the data. I could just go over these questions and say, “Do I know the answer yes, or no?”

Once you have that labeled data then you can start to build models of it, and you can start to see where your strong points and weak points are, and go accordingly.

Then once I had the initial labeling done, I used spaced repetition to help me study for the show. I leaned on Anki for that.

Basically, I used scripts to generate decks of cards from the data I had collected. I had tons and tons of these decks that I used to study.

Spaced repetition software builds a model of what you know, don’t know, and what you’re likely to forget. Then it helps you get from where you currently are to where you want to go because it keeps bringing cards up that you’re likely to forget to help them stick in your mind.

You’re building a model of the present built off the past. And you’re using that to, hopefully, predict the future.