Google: Speech One of Two Big Research Projects

A couple of weeks ago MIT's Technology Review published an interview with Google Director of Research Peter Norvig that explores his (and presumably Google's) thinking about problems in search and "next-generation" search functionality that Google is working on. Speech recognition, which powers Goog411, is one of the two big research projects the company is working on right now.

Here are some relevant excerpts from the interview:

Q: Which research has the most people and funding?

NORVIG: The two biggest projects are machine translation and the speech project. Translation and speech went all the way from one or two people working on them to, now, live systems.

Q: Like the Google Labs project called GOOG-411 [a free service that lets people search for local businesses by voice, over the phone]. Tell me more about it.

NORVIG: I think it's the only major [phone-based business-search] service of its kind that has no human fallback. It's 100 percent automated, and there seems to be a good response to it. In general, it looks like things are moving more toward the mobile market, and we thought it was important to deal with the market where you might not have access to a keyboard or might not want to type in search queries.

Q: And speech recognition can also be important for video search, isn't it? Blinkx and Everyzing are two examples of startups that are using the technology to search inside video. Is Google working on something similar?

NORVIG: Right now, people aren't searching for video much. If they are, they have a very specific thing in mind like "Coke" and "Mentos." People don't search for things like "Show me the speech where so-and-so talks about this aspect of Middle East history." But all of that information is there, and with speech recognition, we can access it.

We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time.


Google's apparent commitment to speech is striking and reflects its importance in mobile, but longer term in other areas on the desktop. Anecdotal evidence suggests users are fairly happy with the Goog411 service though it has no human agent backup and fails to recognize queries some percentage of the time.