MIT Tech Review recently released its annual Top 10 Breakthrough Technologies list. Baidu’s Deep Speech 2 won in the “Conversational Interfaces” category.
Reporter Will Knight in the MIT Tech Review wrote:
“Voice interfaces have been a dream of technologists (not to mention science fiction writers) for many decades. But in recent years, thanks to some impressive advances in machine learning, voice control has become a lot more practical.”
“In November, Baidu reached an important landmark with its voice technology, announcing that its Silicon Valley lab had developed a powerful new speech recognition engine called Deep Speech 2. It consists of a very large, or ‘deep,’ neural network that learns to associate sounds with words and phrases as it is fed millions of examples of transcribed speech. Deep Speech 2 can recognize spoken words with stunning accuracy. In fact, the researchers found that it can sometimes transcribe snippets of Mandarin speech more accurately than a person.”
Deep Speech 2 is striking because the engine essentially works as a universal speech system, learning English just as well as multiple versions of Chinese when fed enough examples. Older voice-recognition systems include many handcrafted components to aid audio processing and transcription. The Baidu system learned to recognize words from scratch, simply by listening to thousands of hours of transcribed audio. The technology relies on deep learning, which involves training a very large multilayered virtual network to recognize patterns in vast quantities of data. Like Google, Baidu has been exploring artificial intelligence for use on its servers and other applications. AI is deemed so important by Baidu that two years ago it hired Andrew Ng, who founded Google’s Brain Team, to be its chief scientist.
A story in the South China Morning Post described why Baidu’s breakthrough on speech recognition is a game changer: A growing number of China’s 691 million smartphone users now regularly dispense with swipes, taps and tiny keyboards when looking things up on the country’s most popular search engine, Baidu. China is an ideal place for voice interfaces to take off, because Chinese characters were hardly designed with tiny touchscreens in mind. But people everywhere should benefit as Baidu advances speech technology and makes voice interfaces more practical and useful. That could make it easier for anyone to communicate with the machines around us.
“I see speech approaching a point where it could become so reliable that you can just use it and not even think about it,” says Andrew Ng Yan-tak, Baidu’s chief scientist and an associate professor at Stanford University, in the United States. “The best technology is often invisible and, as speech recognition becomes more reliable, I hope it will disappear into the background.”