EECS professor Stuart Russell

EECS professor Stuart Russell. (Photo by Noah Berger)

Toward human-centric A.I.

Twenty years ago, Stuart Russell co-wrote a book titled Artificial Intelligence: A Modern Approach (AIMA), destined to become the dominant text in its field. Near the end of the book, he posed a question: “What if A.I. does succeed?”

Today, progress toward human-level artificial intelligence (A.I.) is advancing rapidly, and Russell, a professor of computer science, is posing the same question with more urgency.

The benefits of A.I. are not at issue. Safety is. If improperly constrained, Russell warns, a machine as smart as or smarter than humans “is of no use whatsoever — in fact it’s catastrophic.”

Berkeley’s new Center for Human-Compatible Artificial Intelligence, launched this August, will focus on making sure that when A.I. succeeds, its behavior is aligned with human values. Russell leads the center, which aims to establish “Research Priorities for Robust and Beneficial Artificial Intelligence” — that is, provably safe A.I.

The title is from an open letter Russell wrote last year; published by the Future of Life Institute, it emphasized the benefits of powerful artificial intelligence but argued for research that insures those benefits “while avoiding potential pitfalls.” The letter drew more than 8,000 signatories, including many science and technology superstars, Stephen Hawking and Elon Musk among them.

One of the worst pitfalls is also one of the most ancient, Russell says, familiar from such stories as Ovid’s tale of King Midas or Goethe’s Sorcerer’s Apprentice — that of getting not what you want, but exactly what you ask for. Nick Bostrom, author of Ethical Issues in Advanced Artificial Intelligence, provides a modern parable, the “paperclip maximizer” — a smart machine designed to produce as many paperclips as possible. Turn it on, and its first act is to kill you, to prevent being turned off. Its objective leads to a future, Bostrom writes, “with a lot of paper clips, but no humans.”

Russell illustrates, standing at his office whiteboard with marker in hand. “If you think of all the objectives an intelligent system could have” — he sweeps out a wide circle — “here are the ones that are compatible with human life.” He draws a tiny circle inside the big one. “Most of the ways you can put a purpose into a machine are going to be wrong.”

Thinking about thinking

The approach taken in AIMA by Russell and co-author Peter Norvig (Ph.D.’85 EECS, now Google’s research director) links intelligence with rationality; it centers on agents that sense and act on their environments in order to achieve objectives. That’s the common meaning of intelligence in A.I. research today.

Intelligent agents can be many things: an autonomous robot, perhaps, or a search engine or a program that can best a human at chess, Go or other games. Every agent has a programmed objective. “The more intelligent it is, the more you need to be right about the objective,” says Russell. “Otherwise, the machine will find ways of achieving it that we really don’t like.”

Twitter bots illustrate the problem in miniature. Earlier this year, a web designer in the Netherlands fashioned a bot that issued tweets in his name, lifting phrases from his previous tweets. Innocent enough, until it tweeted to the site of a fashion event, “I seriously want to kill people.”

A month later, a more complex bot named Tay, a Microsoft experiment in machine learning, pursued conversations with visitors to its site. A New York Times headline neatly summed up the aftermath: “Microsoft Created a Twitter Bot to Learn From Users. It Quickly Became a Racist Jerk.” Malicious users were part of the problem, but Tay also created slurs on its own.

“Someone who releases a robot with emergent goals and behaviors is being reckless and is responsible for whatever happens,” Russell remarks. When absence of oversight or control is deliberately chosen, “lack of foreseeability isn't a defense.”

The most dangerous example in the near term is lethal autonomous weapons, of which smart drones without humans in the loop are just one instance. To align the goals of lethal machines with contradictory human values may prove impossible. Russell is a leader in the effort to halt development of intelligent autonomous weapons.

Then there’s the notion of the technological singularity, when supersmart systems suddenly trigger “an exponential runaway beyond any hope of control,” in the words of science-fiction author Vernor Vinge, who popularized the phrase. Russell cautions that the singularity is “not a necessary consequence of achieving superintelligent A.I., nor is it a necessary component of the argument for why superintelligent A.I. poses a risk.” Rather, he says, it provides “one dramatic example of how a loss of control might occur.”

How close are we?

Human-level machine intelligence faces some formidable roadblocks. “The key thing humans have is the ability to plan over long time scales,” Russell says. “I can plan to go to Colorado to give a talk and come back on Saturday. In terms of primitive motor actions” — the kind that robots toil to master, like folding towels — “that period of time covers 500 million actions.”

Humans don’t sweat the details. “We have available a kind of library of high-level actions, like, ‘buy an airplane ticket,’ ‘go to the airport,’” says Russell, each of which breaks down into tens of thousands of motor-control commands. “We don’t yet understand how to get machines to do that, to form new high-level actions built out of more primitive actions.”

An artificially intelligent agent must discover human goals before it can share them. “That’s a formidable engineering task,” Russell says. While it’s natural for humans to suppose that A.I. systems think and feel as we do, they don’t. A human playing a board game wants to win, whereas the machine’s goal is to optimize its internal mathematical rewards.  

Last March, the four-to-one victory match of Google’s AlphaGo program over Go world champion Lee Sedol came as a startling upset to many tech experts. None of the program’s components were novel. Russell had expected a program would have to be more advanced to win at Go: “Existing techniques may already be more powerful than we thought.”

Yet unlike a game’s perfect information environment, the real world is messy, dynamic and only partially observable. In the late 1990s, Russell introduced a technique called inverse reinforcement learning: an agent learns the objective that another entity — perhaps a human being — is optimizing by observing the entity's behavior.

Russell uses the example of a naive household robot watching a human get up in the morning, go downstairs to the kitchen and “do something that makes grinding sounds and steam and then this brown liquid.” Eventually the robot learns that the human likes coffee. Thenceforth it makes sure there’s coffee in the refrigerator; it may even bring the human coffee in bed. “This might seem utterly obvious to people,” Russell says, “but we consider the ability to learn the right objectives to be an important part of human intelligence, and so it should be for A.I.”

This kind of machine intelligence that can not only learn human objectives but align itself with them will be a key goal of the Center for Human-Compatible Artificial Intelligence. Every researcher in the field faces an implicit choice, says Russell: “What should A.I. be? Should A.I. be that we build objective optimizers — and let someone else figure out what the objectives should be, so we don’t wipe out the human race?”

He returns to his office whiteboard. “Or should it be that A.I. is about this?” He points to the tiny circle representing the nonlethal objectives A.I. systems could achieve. “If so, that requires a change in how we think about the field.”