Purdue app slows servers when cooling fails

An administrator from Purdue University has developed software that can slow servers when the air conditioning goes out

While chip manufacturers continue to make their processors ever more powerful, at least one customer has found it useful to slow these chips down, at least long enough to keep them running when the data center air conditioning falters.

Patrick Finnegan, a systems administrator at Purdue University, has developed software that slows the clock speed of server processors, a throttling that reduces the heat they produce.

"Previously our only options were to put in a few large fans and hope that was enough, or start turning servers off," said Mike Shuey, who oversees Purdue's supercomputers. "This software gives us a middle ground that gets us by many outages."

Purdue is now reselling the software for US$250, through FolioDirect, an online e-commerce service for educational institutions.

With most commodity servers, once their ambient temperatures reaches a certain point, usually around 32 Celsius (About 90 degrees Fahrenheit), they will automatically shut off to prevent damage from overheating. Smart administrators will turn them off ahead of that, at least to facilitate a graceful shutdown.

In the world of academic supercomputing these restarts can be deadly, though. Purdue's clusters run many serial jobs that can take days, weeks, or even months to complete. And while some programs have frequent setpoints to which they can return that are close to where they at shutdown, many do not. One Purdue researcher, for instance runs atmospheric climate models that can require four months of continuous computing time.

"If our only recourse to survive an outage is to start turning off machines, we can throw away from two to three million[m] CPU hours of work," Shuey said. "It can take weeks and weeks of run time just to get back to the state we were in the minute before we turned things off."

In contrast, by throttling back the servers, the programs are slowed, but no work is lost.

Finnegan built the software using a clock frequency scaling driver available for the Linux kernel, which can control both Intel and AMD chipsets with frequency scaling capabilities. The software also relies on Altair job scheduling software as well as a set of cluster management tools from the U.S. Department of Energy's Oak Ridge National Laboratory.

As far as Shuey knows, no other software is available to do this task, either open source or commercial, at least for large clusters of servers.

Overall, the Purdue data center runs around 15,000 processors, mostly across two supercomputer clusters. One, called Coates, supplied by Hewlett-Packard, runs just under 8,000 processors from AMD. The other, a Dell-supplied configuration nicknamed Steele, runs 5,600 Intel processors.

The Purdue team estimates that power usage by processors can be cut by as much as 10 percent on Intel processors and by as much as 30 percent on AMD processors. The amount of power a server uses usually directly correlates to the amount of cooling needed.

"They may lose 70 to 80 percent performance, but we get a 30 percent power savings," Shuey said.

At least in its current incarnation, the data center's plan for cooling outages still requires a human in the loop.

The facility is cooled by chilled water piped in from the school's main cooling plant. The optimal temperature for the building is about 21 degrees Celsius (or about 70 degrees Fahrenheit). The data center uses an APC temperature monitoring system, which sets off alarms should the temperature go above 26 degrees Celsius (or about 80 degrees Fahrenheit). Should the alarm go off, the administrator can use the software console to throttle back the servers.

Since Finnegan wrote the software earlier this year, the school, located at West Lafayette, Indiana, has had to cut back server speeds twice, due to a combination of planned maintenance-related outages and a hotter-than-usual summer. Both times, the throttling worked as planned.

"The compute jobs slowed down, but the data center temperatures dropped," Shuey said. "It's much better to have jobs run slowly for an hour rather than throw away everyone's work in progress and mobilize staff to try to fix things," Shuey says.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags ClusterssupercomputersserversNetworkingAMDhardware systemsintelenvironmentHewlett-PackardDellGreen data centerHigh performance

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

Breitling Superocean Heritage Chronographe 44

Learn more >

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?