The language of probabilities
Organized by New York Times crossword editor Will Shortz, the American Crossword Puzzle Tournament (ACPT) is the oldest and biggest tournament of its kind. This year, the event moved to a virtual format, attracting more than 1,100 contestants. And for the first time, the top scorer was not a human but an artificial intelligence system known as Dr.Fill. This unprecedented victory was born of a last-minute collaboration between the original system’s inventor, Matthew Ginsberg, and a team of engineers from the Berkeley Natural Language Processing (NLP) Group.
Ginsberg first developed the Dr.Fill program in 2012, when it finished 141st at the ACPT. Using techniques from machine learning and classic AI, Dr.Fill gradually improved its tournament performance, with its best ranking at 11th place in 2017. This year, its core AI was augmented by a state-of-the-art neural question-answering technology — called the Berkeley Crossword Solver (BCS) — from the NLP Group, led by computer science professor Dan Klein.
The partnership began with Ph.D. student Nicholas Tomlin, who had been re-implementing Dr.Fill as a fun side project since summer 2020. In February, Klein, Ph.D. students Eric Wallace and Kevin Yang, and undergrads Albert Xu and Eshaan Pathak joined the effort. The team reached out to Ginsberg just two weeks before the competition. “It was natural to join forces,” said Klein. “Our systems were designed in a way that made it very easy to interoperate because they both speak the language of probabilities.”
According to Klein, there are two parts to solving the puzzle: first to come up with answers to the clues and then, from the answers that might work, to find which ones fit together. “The first part is really a game of language understanding, and the second is about search and reasoning.”
The BCS is a machine-learning based system that takes enormous amounts of data, both from past puzzles and more generally, to learn a neural network model. The team built a new question-answering system that learned how to combine general language understanding with the kinds of creative clues that show up in crosswords. Like a human, the system knows a good deal about language before it plays its first crossword, and then gets better as it trains on each puzzle. Compared to a human, the system has less world knowledge, but it’s been trained on 6.5 million past crossword clues. The BCS’ hypotheses on each clue were then given to Dr.Fill, which had expertise on crossword structure, such as how to weigh alternatives within the grid and how clever themes modify the rules for any given puzzle.
Ultimately, Dr.Fill bested humans throughout the competition, which ranks contestants on accuracy and speed. In the qualifying phase, Dr.Fill beat the highest human score. For the final playoff puzzle, multiple contestants — including Dr.Fill — completed it with no errors, but Dr.Fill finished in just 49 seconds, which was two minutes and 11 seconds faster than the human winner.
Learn more: The language of probabilities