Brain behind IBM's Watson not unlike a human's

Like humans, Watson only uses a fraction of its memory to generate answers to Jeopardy questions

IBM's Watson supercomputer, which shellacked Jeopardy's top human champions during airings of the game show this week, is powered by 90 servers and a network-attacked storage (NAS) cluster with 21.6TB of data.

In the end, though, its brain only has 80 per cent of the processing power of a human brain.

Tony Pearson, master inventor and senior consultant at IBM, explained that Watson only uses about 1TB to process real-time answers to Jeopardy questions after configuring its back end storage as RAID, and then culling that data further to be loaded into the clustered server system's memory.

Pearson cited the estimate of technology futurist and author Ray Kurzweil that the human brain can hold about 1.25TB of data, and performs at roughly 100 teraflops. In comparison, Watson is an 80-teraflop system with 1TB of memory.

"So it's 80 per cent human," Pearson said. "Yes, we could have handled a lot more information. We could have put more memory in each server, but once we got the answers to three seconds, we didn't need to go further."

Pearson explained that reaching the three-second answer threshold was just a matter of simple mathematics.

The original algorithm singled threaded on a single core processor took two hours to scan memory and produce an answer. So the IBM technologists just divided two hours by 2,880 CPUs, which produced the ability to answer questions in three-seconds.

If IBM's Watson were just some other human Jeopardy contestant, viewers probably would have tuned out in the midst of such a landslide victory. Instead, interest in the man vs. machine battle gave the show its highest ratings in nearly six years.

Competition between humans and computers has long captured the public's imagination. Remember the 1996 chess match between IBM's Deep Blue computer and the reigning world champion Garry Kasparov?

In this case, though, Watson has more in common with humans than Deep Blue. Like us, he only uses a small percentage of his massive memory capacity to answer questions.

Behind Watson's simple scribble-faced monitor that he used as a Jeopardy contestant are 90 IBM Power 750 Express servers powered by 8-core processors -- four in each machine for a total of 32 processors per machine. The servers are virtualized using a Kernel-based Virtual Machine (KVM) implementation, creating a server cluster with a total processing capacity of 80 teraflops. A teraflop is one trillion operations per second.

On top of processing power, the each server has 160GB of DRAM to provide the full machine with almost 15TB of memory.

On the backend of the computer is IBM's SONAS General Parallel File System (GPFS). SONAS, or Scale-Out NAS, is a Linux-based clustered file system that IBM released almost exactly one year ago.

The clustered storage model provides massive throughput because of an increased port count that comes from cobbling many storage servers together into a single pool of disks and processors all working on a similar task and all able to share the same data through a single global name space. In other words, all of the disk drives appear as one big pool of storage capacity from which Watson can draw.

Watson's SONAS is populated with 48 450GB serial ATA (SATA) hard drives for a total of 21.61TB of capacity in a RAID 1 (mirrored) configuration; that leaves 10.8TB of raw data that is used by Watson every time it's booted up. Three terabytes of that, however, is used for the operating system and applications.

But it's not SONAS's disk-based storage that makes Watson so darned fast; it's the CPUs and memory. Every time Watson boots, the 10.8TB of data is automatically loaded into Watson's 15TB of RAM, and of that, only about 1TB is scanned for use in answering Jeopardy questions, Pearson said.

In case you're wondering, 1TB of capacity is still quite significant; it can hold 220 million pages of text or 111 DVDs.

"The amazing thing is that you can get all those answers with such a small data set," said John Webster, an analyst with the research firm Evaluator Group. "After multiple iterations of loading and testing and loading and testing and updating the database on SONAS, IBM came up with a version of the database of that would generate the data set that got loaded into memory."

Enter Australian computer programmer and SAMBA developer Andrew Tridgell.

Tridgell created the computer algorithm running on top of Watson's hardware that culls out the data set. He developed the open-source Clustered Trivial Database (CTDB), which the SAMBA file protocol uses to simultaneously access the memory across Watson's 90 servers.

More importantly, the CTDB makes sure none of the servers are stepping on each other as they also update information after a Jeopardy show.

During the show, Watson is read-only - meaning nothing gets written to its backend SONAS. After the show, Watson is powered down and the computer scientists go to work updating information and debugging it - trying to figure out why it gave erroneous answers, such as choosing Toronto as the answer for a question about American city.

"I'm sure they're scratching their head on that one," Pearson said.

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian , or subscribe to Lucas's RSS feed . His e-mail address is lmearian@computerworld.com .

Read more about mainframes and supercomputers in Computerworld's Mainframes and Supercomputers Topic Center.

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags Mainframes and SupercomputersIBMhardware systems

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Lucas Mearian

Computerworld (US)
Show Comments

Essentials

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?