Researchers at DeepMind staged a machine-versus-man Go contest in October, at the company’s offices in London:
The DeepMind system, dubbed AlphaGo, matched its artificial wits against Fan Hui, Europe’s reigning Go champion, and the AI system went undefeated in five games witnessed by an editor from the journal Nature and an arbiter representing the British Go Federation. “It was one of the most exciting moments in my career, both as a researcher and as an editor,” the Nature editor, Dr. Tanguy Chouard, said during a conference call with reporters on Tuesday.
This morning, Nature published a paper describing DeepMind’s system, which makes clever use of, among other techniques, an increasingly important AI technology called deep learning. Using a vast collection of Go moves from expert players — about 30 million moves in total — DeepMind researchers trained their system to play Go on its own. But this was merely a first step. In theory, such training only produces a system as good as the best humans. To beat the best, the researchers then matched their system against itself. This allowed them to generate a new collection of moves they could then use to train a new AI player that could top a grandmaster.
“The most significant aspect of all this…is that AlphaGo isn’t just an expert system, built with handcrafted rules,” says Demis Hassabis, who oversees DeepMind. “Instead, it uses general machine-learning techniques how to win at Go.”
[...]
“Go is implicit. It’s all pattern matching,” says Hassabis. “But that’s what deep learning does very well.”
[...]
At DeepMind and Edinburgh and Facebook, researchers hoped neural networks could master Go by “looking” at board positions, much like a human plays. As Facebook showed in a recent research paper, the technique works quite well. By pairing deep learning and the Monte Carlo Tree method, Facebook beat some human players — though not Crazystone and other top creations.
But DeepMind pushes this idea much further. After training on 30 million human moves, a DeepMind neural net could predict the next human move about 57 percent of the time — an impressive number (the previous record was 44 percent). Then Hassabis and team matched this neural net against slightly different versions of itself through what’s called reinforcement learning. Essentially, as the neural nets play each other, the system tracks which move brings the most reward — the most territory on the board. Over time, it gets better and better at recognizing which moves will work and which won’t.
“AlphaGo learned to discover new strategies for itself, by playing millions of games between its neural networks, against themselves, and gradually improving,” says DeepMind researcher David Silver.
According to Silver, this allowed AlphaGo to top other Go-playing AI systems, including Crazystone. Then the researchers fed the results into a second neural network. Grabbing the moves suggested by the first, it uses many of the same techniques to look ahead to the result of each move. This is similar to what older systems like Deep Blue would do with chess, except that the system is learning as it goes along, as it analyzes more data — not exploring every possible outcome through brute force. In this way, AlphaGo learned to beat not only existing AI programs but a top human as well.