Hopkins to build data analysis super machine

Johns Hopkins redefines the rules for building supercomputers

Disregarding the supercomputing community's insatiable thirst for FLOPS (floating point operations per second), the Baltimore-based Johns Hopkins University is configuring its new machine to achieve the maximum number of IOPS (I/O operations per second) instead.

The novel design will be better suited to the kind of data-mining-oriented scientific workloads processed by today's supercomputers, argued Alexander Szalay, a computer scientist and astrophysicist at Johns Hopkins' Institute for Data Intensive Engineering and Science, who is leading the project.

"For the sciences, it is the I/O that is becoming the major bottleneck," he explained. "People are running larger and larger simulations, and they take up so much memory, it is difficult to write the output to disk."

The U.S. National Science Foundation (NSF) has provided US$2.1 million for the system, called Data-Scope. Hopkins itself is contributing $1 million as well.

Thus far, 20 research groups within Hopkins have indicated they could use the system to study problems in genomics, ocean circulation, turbulence, astrophysics and environmental science. The university will also allow outside organizations to use the machine. Data-Scope is expected to go live by next May.

FLOPS measures the amount of floating point calculations a computer can do in a second, an essential tool for analyzing large amounts of data. But IOPS measures the amount of data that can be moved on and off a computer.

By maximizing IOPS, the new system will "enable data analysis tasks that are simply not possible today," the researchers stated in the proposal.

Today, most researchers are limited to analyzing datasets only up to 10 terabytes in size, while larger datasets, such as those that are 100 terabytes or more, can only be investigated by a handful of the largest supercomputers. Hopkins' novel configuration of hardware might offer a lower cost way to analyze such big datasets, Szalay said.

The machine, once built, will have a total I/O bandwidth of 400 to 500 gigabytes per second, approximately more than twice that of the fastest computer, Oak Ridge National Laboratory's Jaguar, on the Top 500 ranking of the world's most powerful computers.

Data-Scope, however, will only offer a peak performance of about 600 teraflops, far short of Jaguar's 1.75 Petaflops.

In Hopkins' design, each server will have 24 dedicated hard disk drives as well as four solid state disks, which in total can provide 4.4 gigabytes per second across the chassis bus directly to two GPUs (graphics processing units), which will do much of the calculations.

Overall, the system will have about 100 of these machines and about five petabytes in storage total.

To guide the design, the team used a rule-of-thumb devised by computer scientist Gene Amdahl. Ideally, Amdahl posited, a computer should have one I/O bit ready for each instruction it executes.

Most supercomputer architects have disregarded this rule, claiming the processor caches can bank data and have it ready for use when needed. Now that datasets have grown so large, Amdahl's rule should be reconsidered, Szalay argued.

A typical Amdahl number for a supercomputer would be an Amdahl .001, or a thousandth of the optimal balance, whereas Data-Scope should have an Amdahl number of about .6 or .7.

The designers also plan to make some changes in the way databases are used. "We don't use the database just as dump storage but as an active computing environment," Szalay said. Instead of moving data from a database across a network to a cluster of servers, researchers can write user-defined functions that can run against the database itself.

Researchers can use one of three images that can be booted on the system: Windows Server 2008, a combination of Linux and MySQL and a third instance running Hadoop.

Data-Scope will be housed in a new campus green data center being built with $1.3 million in funding from the NSF.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags ClustersHigh performanceJohns Hopkinsapplicationshardware systemsdata miningsoftwareU.S. National Science Foundation

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

Crucial Ballistix Elite 32GB Kit (4 x 8GB) DDR4-3000 UDIMM

Learn more >

Gadgets & Things

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Family Friendly

Lexar® JumpDrive® S57 USB 3.0 flash drive 

Learn more >

Stocking Stuffer

Plox Star Wars Death Star Levitating Bluetooth Speaker

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?