Creating deep-learning systems is more like coaching than playing

Wednesday, March 29th, 2017

Larry Zitnick is a walking, talking, teaching symbol of how quickly deep-learning techniques have ascended:

At Microsoft, he spent a decade working to build systems that could see like humans. Then, in 2012, deep learning techniques eclipsed his ten years of research in a matter of months.

In essence, researchers like Zitnick were building machine vision one tiny piece at time, applying very particular techniques to very particular parts of the problem. But then academics like Geoff Hinton showed that a single piece—a deep neural network—could achieve far more. Rather than code a system by hand, Hinton and company built neural networks that could learn tasks largely on their own by analyzing vast amounts of data. “We saw this huge step change with deep learning,” Zitnick says. “Things started to work.”

For Zitnick, the personal turning point came one afternoon in the fall of 2013. He was sitting in a lecture hall at the University of California, Berkeley, listening to a PhD student named Ross Girshick describe a deep learning system that could learn to identify objects in photos. Feed it millions of cat photos, for instance, and it could learn to identify a cat—actually pinpoint it in the photo. As Girshick described the math behind his method, Zitnick could see where the grad student was headed. All he wanted to hear was how well the system performed. He kept whispering: “Just tell us the numbers.” Finally, Girshick gave the numbers. “It was super-clear that this was going to be the way of the future,” Zitnick says.

Within weeks, he hired Girshick at Microsoft Research, as he and the rest of the company’s computer vision team reorganized their work around deep learning. This required a sizable shift in thinking. As a top researcher once told me, creating these deep learning systems is more like being a coach than a player. Rather than building a piece of software on your own, one line of code at a time, you’re coaxing a result from a sea of information.

Leave a Reply