ACM CHI: more search could be crowdsourced

Search engines could expand the range of answers they provide through simple filtering and crowdsourcing collaboration

Search engines could use crowdsourcing to expand the range of answers they give to their users, a group of researchers from the Microsoft and the Massachusetts Institute of Technology have concluded.

Today, Web search engines primarily use computer-run page ranking algorithms to generate results for user submitted queries. For a small number of simple queries however, services return the exact answer the user is seeking. Google, for instance, could return a the local show times for a movie, the weather for a certain region, or the results of a simple math problem.

This range of answers could be radically expanded through some data mining techniques and crowdsourced editing, according to M.I.T. researcher Michael Bernstein, who summarized the group's work at the Association for Computing Machinery's Conference on Human Factors in Computing Systems, being held this week in Austin, Texas.

In a trial survey with 361 participants, the researchers found that search engines, by providing more direct answers to queries, could significantly improve their users' perceptions of search quality, especially for those queries that did not return many relevant pages. "Our findings suggest that search engines can be extended to directly respond to a large new class of queries," stated the paper describing the work, entitled "Direct Answers for Search Queries in the Long Tail.".

The range of answers search engines could provide could be radically expanded with a relative minimal additional cost, the researchers argued. The key would be to harness the power of crowdsourcing, or contracting people to identify the answers to simple but frequently asked questions.

Today, search engines will only provide direct answers to a small subset of queries, namely those that get asked often. In these cases, the search engine provides the actual answer to the question, rather than just a link to where the answer could be found. With such popular questions, search engine companies find it worthwhile to devote engineers to manually craft program code to identify each question, and then find and supply the answers. "These kinds of answers are only available to popular queries, because search engines have to put a lot of effort into them," Bernstein said.

The number of direct answers provided to users could be expanded, at minimal cost, the researchers argue. About 50 percent of the queries that search engines get are completely novel, Bernstein said. But the rest are questions that are repeatedly asked. At least some of these queries have answers that can easily be generated, and checked through some simple crowd-sourcing.

"We are focusing on a set of queries that are somewhat popular," Bernstein said. "We can create thousands of these answers." In the future, a search service could provide direct answers to many additional questions, such as how to shut down a stalled Apple Mac computer, what the average body temperature is for a dog, how to bake a potato, or how to play the Rummy 500 card game.

In a trial experiment, the researchers had data mining software comb through 75 million search queries from Microsoft's Bing search engine, looking for those queries that resulted in a click through to a single site. They then identified those queries that could be succinctly answered and contracted workers to quickly craft simple answers and proofread the work. They found these workers through Amazon's Mechanical Turk, by way of a third party service called Crowdflower.

By automating as much of the process of creating the content as possible, search engines can keep their costs minimal. Search engines could contract out the manual labor on a piecemeal basis, using services such as Amazon's Mechanical Turk. The researchers identified about 20,000 queries that could be easily provided with answers. They estimated it would cost search engines about .44 cents to provide a simple answer for each query.

Bernstein admitted that this approach, should it be used, would raise a number of issues. For one, search engines would have to filter out incorrect information somehow. Also, search engines would risk the ire of Web site owners, who would complain that the answers deprives them of Internet traffic, because the search engine itself is providing the answer. "We have to ask ourself whether we are going too far," he said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments





Back To Business Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?