Python gets a big data boost from DARPA

Continuum Analytics will extend the widely used NumPy library for distributed systems

DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs.

The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively.

More mathematically centered languages such as the R Statistical language might seem better suited for big-data number crunching, but Python offers an advantage of being easy to learn.

"Python is a very easy language to learn for non-programmers," said Peter Wang, president of Continuum Analytics. That's important because most big-data analysts will probably not be programmers. If they can learn an easy language, they won't have to rely on an external software development group to complete their analysis, Wang said.

The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data.

For the XData project, DARPA awarded funding to about two dozen companies, including the University of Southern California, Stanford University and Lawrence Berkeley National Laboratory. The organizations are encouraged to use each other's technologies to further extend what can be done in big data, Wang said.

DARPA encouraged the funding recipients to release products based on their work and to release their code as open source, so the innovations can be widely used and supported outside of the military. The Defense Department is trying to avoid commissioning software that gets used only by the military, which may then become prohibitively time-consuming and expensive to update.

"With big data systems, you find new things you want to look at every week. You can't wait for that process any more," Wang said.

Headquartered in Austin, Texas, Continuum Analytics offers add-on products and services that help organizations use Python for data analysis. The company will use the DARPA money to continue development of a number of add-on technologies it has been working on, including Blaze, Numba and Bokeh, all of which provide advanced features not offered in Python itself.

At the PyData 2012 conference in New York last November, Continuum engineer Stephen Diehl discussed how Blaze would operate, describing the library as a potential successor to NumPy.

NumPy has limitations that Blaze seeks to correct, Diehl said. Most notably, NumPy only offers the ability to store a series of numbers as one continuous string of data. "It is a single buffer, a continuous block of memory. That may be OK for some uses, but the real world is more heterogenous," he said in a presentation.

Blaze can "endow [data] with structure," Diehl said. It will also allow programmers to establish multidimensional arrays and store these arrays in a distributed architecture, across multiple machines.

Bokeh is a Python library that can visually render large data sets using the HTML 5 Canvas tag, while Numba is a Python compiler that recognizes NumPy calls. Numba is included in Continuum's flagship product, Anaconda, a Python distribution with a number of premium data analysis features.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags application developmentLanguages and standardsDefense Advanced Research Projects AgencyapplicationssoftwareData managementContinuum Analytics

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?