CERN modernizes IT infrastructure with OpenStack and Puppet

But the research organization will remain faithful to tape storage

CERN's datacenter is modernizing with OpenStack and Puppet.

CERN's datacenter is modernizing with OpenStack and Puppet.

CERN is making the infrastructure that handles the data from the Large Hadron Collider (LHC) more flexible by upgrading it with OpenStack for virtualization and Puppet for configuration management.

The research organization's objective is to change how it provides services to scientists working at the LHC, which runs in a 27-kilometer circular tunnel about 100 meters beneath the Swiss and French border at Geneva.

"One of the things we have to contend with is how to scale our infrastructure fairly significantly with a fixed staff and fixed costs. With a fixed budget you can buy more and more equipment, but you can't provide more and more services with the same number of people," said Ian Bird, LHC computing grid project leader.

But that may be possible if you change the way things are done. CERN's goal is to become more efficient by moving in the direction of infrastructure-as-a-service and platform-as-a-service with a private cloud. The goal is to be able to more dynamically change how the infrastructure is used. Right now the accelerator is shut down so the CERN data center has a different workload from last year when the LHC was running, according to Bird.

"Users also want to provision an analysis cluster with 50 machines themselves for an afternoon that then goes away again. It is about providing those kinds of services," Bird said.

CERN chose OpenStack because it seems to be the platform with the most traction behind it. OpenStack's popularity also makes it attractive from a staffing point of view, according to Bird.

"We have a transient staff, because not everybody has permanent contracts. So it's good to have people that come in with that expertise or can leave with it, and then sell it somewhere else," Bird said.

CERN is also moving away from the custom in-house software that manages the cluster itself to software like Puppet.

"When we started scaling up the cluster for LHC, the large scale Googles and Amazons didn't really exist. So we invested quite a lot of effort in configuration management and monitoring, but a couple of years ago we decided to instead go with something that had a larger support community," Bird said.

CERN looked at Chef and Puppet, and chose the latter as it worked in a way that was closer to its own management model. The rollout of Puppet and OpenStack are both underway.

Today CERN's infrastructure is distributed across about 160 data centers of different sizes located around the world.

"The reason behind that is twofold; one is given the size of the data center we have here there is no way we could have done all the computing for the LHC, and the other is political and sociological. We are given money to do computing, but it is preferred that the funding stays where it is coming from," said Bird.

CERN's own data center and a recently announced data center in Budapest is tier 0, and the next tier is made up of 11 data centers that are typically located at large national labs, such as the FermiLab in the U.S., according to Bird. The last tier mostly consists of computing resources at universities.

To make OpenStack a better fit for CERN's distributed computing resources, the organization will collaborate with the community on data center federation.

"If we at CERN are running OpenStack and other of our grid centers are also running OpenStack we would like to federate the cloud parts ... So if you have your credentials at CERN, you ought to be able to let your work migrate to FermiLab, for example," Bird said.

Storage is a very important part of what CERN does, and the demands are huge. The two big detectors -- CMS and ATLAS -- at the LHC produce about 1 petabyte or 1,000 terabytes of data per second. The detectors track the motion and measure the energy and charge of particles thrown out in all directions after a collision in the accelerator. That data is then whittled down to a few hundred megabytes per second of the most interesting events by a farm of Linux machines with 15,000 processing cores located at each detector.

Still, in 2012 about 30PB of data from the LHC was saved. The data is cached on disk, but then archived on tape. The archive stores about 100PB of data, of which about 70PB comes from the accelerator, according to Bird, who calls the archiving "a non-trivial exercise."

Bird is a big fan of tape storage for three main reasons: cost, error rates and power consumption.

Tape is still a factor of 10 cheaper than the equivalent space on disk. Hosted storage services such as Glacier from Amazon Web Services are much too expensive, Bird said. And the error rate on tapes is extremely low compared to the failure of disks, he said.

It's also important to keep down power consumption, which is a limiting factor in today's data centers. The data center in Budapest was added not because CERN ran out of space, but because it ran out of power. The tape robots use very little power compared to disks, according to Bird.

"Tape is quite significantly underrated. Probably for the last 15 years people have been saying that it is dead, and will be replaced by disk. But it hasn't gone away, and I don't see it going away any time soon. For large archives you can't really compete," Bird said.

But tape has to be managed well for it to work.

"You can't just put it on tape and leave it for 20 years. Tape media changes every two or three years, so we are continually reading it from one generation and copying it to the next generation. We also read it actively to make sure it is still readable," Bird said.

Send news tips and comments to

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags popular sciencevirtualizationServer VirtualizationCERNcloud computinginternetInfrastructure services

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Mikael Ricknäs

IDG News Service
Show Comments

Most Popular Reviews

Best Deals on Good Gear Guide

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?