CERN modernizes IT infrastructure with OpenStack and Puppet

But the research organization will remain faithful to tape storage

CERN's datacenter is modernizing with OpenStack and Puppet.

CERN's datacenter is modernizing with OpenStack and Puppet.

CERN is making the infrastructure that handles the data from the Large Hadron Collider (LHC) more flexible by upgrading it with OpenStack for virtualization and Puppet for configuration management.

The research organization's objective is to change how it provides services to scientists working at the LHC, which runs in a 27-kilometer circular tunnel about 100 meters beneath the Swiss and French border at Geneva.

"One of the things we have to contend with is how to scale our infrastructure fairly significantly with a fixed staff and fixed costs. With a fixed budget you can buy more and more equipment, but you can't provide more and more services with the same number of people," said Ian Bird, LHC computing grid project leader.

But that may be possible if you change the way things are done. CERN's goal is to become more efficient by moving in the direction of infrastructure-as-a-service and platform-as-a-service with a private cloud. The goal is to be able to more dynamically change how the infrastructure is used. Right now the accelerator is shut down so the CERN data center has a different workload from last year when the LHC was running, according to Bird.

"Users also want to provision an analysis cluster with 50 machines themselves for an afternoon that then goes away again. It is about providing those kinds of services," Bird said.

CERN chose OpenStack because it seems to be the platform with the most traction behind it. OpenStack's popularity also makes it attractive from a staffing point of view, according to Bird.

"We have a transient staff, because not everybody has permanent contracts. So it's good to have people that come in with that expertise or can leave with it, and then sell it somewhere else," Bird said.

CERN is also moving away from the custom in-house software that manages the cluster itself to software like Puppet.

"When we started scaling up the cluster for LHC, the large scale Googles and Amazons didn't really exist. So we invested quite a lot of effort in configuration management and monitoring, but a couple of years ago we decided to instead go with something that had a larger support community," Bird said.

CERN looked at Chef and Puppet, and chose the latter as it worked in a way that was closer to its own management model. The rollout of Puppet and OpenStack are both underway.

Today CERN's infrastructure is distributed across about 160 data centers of different sizes located around the world.

"The reason behind that is twofold; one is given the size of the data center we have here there is no way we could have done all the computing for the LHC, and the other is political and sociological. We are given money to do computing, but it is preferred that the funding stays where it is coming from," said Bird.

CERN's own data center and a recently announced data center in Budapest is tier 0, and the next tier is made up of 11 data centers that are typically located at large national labs, such as the FermiLab in the U.S., according to Bird. The last tier mostly consists of computing resources at universities.

To make OpenStack a better fit for CERN's distributed computing resources, the organization will collaborate with the community on data center federation.

"If we at CERN are running OpenStack and other of our grid centers are also running OpenStack we would like to federate the cloud parts ... So if you have your credentials at CERN, you ought to be able to let your work migrate to FermiLab, for example," Bird said.

Storage is a very important part of what CERN does, and the demands are huge. The two big detectors -- CMS and ATLAS -- at the LHC produce about 1 petabyte or 1,000 terabytes of data per second. The detectors track the motion and measure the energy and charge of particles thrown out in all directions after a collision in the accelerator. That data is then whittled down to a few hundred megabytes per second of the most interesting events by a farm of Linux machines with 15,000 processing cores located at each detector.

Still, in 2012 about 30PB of data from the LHC was saved. The data is cached on disk, but then archived on tape. The archive stores about 100PB of data, of which about 70PB comes from the accelerator, according to Bird, who calls the archiving "a non-trivial exercise."

Bird is a big fan of tape storage for three main reasons: cost, error rates and power consumption.

Tape is still a factor of 10 cheaper than the equivalent space on disk. Hosted storage services such as Glacier from Amazon Web Services are much too expensive, Bird said. And the error rate on tapes is extremely low compared to the failure of disks, he said.

It's also important to keep down power consumption, which is a limiting factor in today's data centers. The data center in Budapest was added not because CERN ran out of space, but because it ran out of power. The tape robots use very little power compared to disks, according to Bird.

"Tape is quite significantly underrated. Probably for the last 15 years people have been saying that it is dead, and will be replaced by disk. But it hasn't gone away, and I don't see it going away any time soon. For large archives you can't really compete," Bird said.

But tape has to be managed well for it to work.

"You can't just put it on tape and leave it for 20 years. Tape media changes every two or three years, so we are continually reading it from one generation and copying it to the next generation. We also read it actively to make sure it is still readable," Bird said.

Send news tips and comments to

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags popular sciencevirtualizationServer VirtualizationCERNcloud computinginternetInfrastructure services

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Mikael Ricknäs

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles


PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?