WEB 2.0 - Google wants your phonemes

IDG talks to Marissa Mayer, VP of Search Products & User Experience at Google.

Although Google's non-search engine products, like its Google Apps Web hosted collaboration and communication software suite, get much attention, search technology and its companion ad system and network still generate most of the company's revenue.

At last week's Web 2.0 Summit, IDG News Service caught up with Marissa Mayer, Google's vice president of Search Products & User Experience, to chat about video search, semantic versus keyword search, Google's universal search effort and the challenge of indexing the "deep Web."

What follows is an edited transcript of the interview:

IDGNS: There are different technology approaches to video search. Blinkx, for example, maintains it does it better than Google because it indexes the text of what is said in videos with speech recognition technology. Where is Google with video search today?

Mayer: Google Video has had an interesting evolution. When we first launched it, it was based on closed captions, so literally a transcription of the program, but interestingly you couldn't play video. So we changed it so that you could play video and now we're searching the meta content. That said, one of the future elements of what's likely to happen in search is around speech recognition.

You may have heard about our [US directory assistance] 1-800-GOOG411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model ... that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG411 is about that: Getting a bunch of different speech samples so when you call up or we're trying to get the voice out of video, we can do it with high accuracy.

IDGNS: What about non-speech content in videos -- the action in the clip?

Mayer: That's going to be particularly hard, given that most of Google's approaches are based on text right now. So we really do need the text, which is why our inclination is to build a great speech-to-text model and pull the text out. ... That said, there are a lot instances where humor, context, things that happen in frame that don't necessarily have words, but for that we're going to have to rely on the community to do things like tagging.

There is some very early research happening around recognizing faces in videos, recognizing particular objects, understanding that hey, there's a ball in the frame right now, but it's very early and not at all ready to be deployed in a commercial application.

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Struggling for Christmas presents this year? Check out our Christmas Gift Guide for some top tech suggestions and more.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Juan Carlos Perez

IDG News Service

Most Popular Reviews

Follow Us

Best Deals on GoodGearGuide

Shopping.com

Latest News Articles

Resources

GGG Evaluation Team

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell

LIFEBOOK UH574

The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi

STYLISTIC Q702

The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott

STYLISTIC Q702

My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Latest Jobs

Shopping.com

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?