Python: big data's secret power tool

At PyData, online-ad-platform company AppNexus will reveal how Python plays a critical role in its big data operations

When it comes to analyzing big data, software packages such as Hadoop or the R statistical language come readily to mind. But at least one company, AppNexus, also relies on the Python programming language to help conduct heavy-duty data analysis.

On Friday, at the PyData conference in New York, two AppNexus engineers will explain how this real-time online-advertising-platform company uses the Python programming language to scale operations.

"We're in a state of rapid growth," said AppNexus Director of Optimization and Analytics David Himrod. Himrod, along with AppNexus Technical Lead Engineer Steve Kannan, will present the talk. When Himrod started working at AppNexus three years ago, it had 30 employees. Now it has more than 350. So Himrod is very interested in using technologies that can scale rapidly.

Key to Python's usefulness is its simplicity, Himrod said. One of the biggest challenges that Himrod faces is how to get a diverse set of employees working on the same technology stack. Python provides employees with different backgrounds -- notably engineers, mathematicians and analysts -- a common, easy-to-understand language that can be used to prototype new functionality for the company. "What's nice is we don't have to hire for a specific programming background. Python is easy to teach," Himrod said. "Python is a really clean, easy language to learn."

In fact, Python is so easy to learn that the company's intern teaches new employees the language. He had no programming experience but was able to learn it quickly. In addition, Python libraries such as SciPy, iPython and Pandas provide much of the mathematical functionality typically found in the R programming language.

AppNexus uses a range of technologies for storing and parsing data, including MySQL, IBM's Netezza, Hewlett-Packard's Vertica, Apache Hadoop and HBase. In order to offer its ad services, the company processes about 15TB each day. "We've been able to build a framework that makes it easy for us to grab data from all of these disparate data sources and model them. So instead of everyone spending their time writing database connector code, they are able to use a simple configuration and quickly get off the ground," Himrod said.

As a result of this easy familiarity, Python allows the company to move code from development to production more quickly, since the same code created as a prototype can easily be moved into production. At the talk, Himrod will provide an example of how one analyst, who had only a minor in computer science, was able to develop an algorithm that was later deployed at full scale.

Created in the late 1980s, Python is a highly flexible and dynamic language that has found a large audience among system administrators and developers who need a language to quickly assemble programs. The PyData conference, however, will focus on using Python for more specialized analysis tasks, with different talks on using Python and related libraries to process data streams, to visualize datasets, and to carry out scientific calculations.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags applicationsdata miningsoftwareAppNexus

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Essentials

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?