Hadoop gets native R programming for big data analysis

Revolution R Enterprise has released a plug-in for running R analytics on Hadoopo data sets

Sensing a growing interest in big data-style analysis, software provider Revolution Analytics has updated its flagship package of R statistical functions so it can be run with the Hadoop data processing platform.

Revolution R Enterprise 7 (RRE 7), to be made available on Monday, also features the ability to run R within Teradata databases as well.

The R language provides a way to run common statistical tests -- such as linear and nonlinear modelling, time-series analysis, classification, and clustering -- on a set of data, often portraying the results in graphical form.

R is becoming increasingly popular for sophisticated data analysis that goes beyond what can be offered by more standard business intelligence (BI) packages. Revolution Analytics has estimated that over 2 million people use R worldwide.

RRE7 includes a library of R algorithms that can be run in parallel across multiple nodes, which is how Hadoop manages large data sets. RRE 7 can be added to the Cloudera CDH3 and CDH4 Hadoop distributions as well as Hortonworks Data Platform 1.3.

The new R library includes the most commonly used statistical and predictive analytics algorithms for tasks such as data processing, data sampling, descriptive statistics, statistical tests, data visualization, simulation, machine learning and predictive models.

By analyzing the data within the node in which it resides, rather than moving it somewhere else to be analyzed, R-based data analysis can done more quickly, according to Revolution Analytics. It also allows an entire set of data to be analyzed, rather than a subset or summary of the data, which is the approach typically taken with enterprise data warehouses (EDWs).

Revolution Analytics hopes the incorporation of R within Hadoop and the Teradata databases will also broaden the use of the language to line-of-business managers. The company has designed a new workflow interface that does not require knowledge of how to implement specific R algorithms. This eliminates the hassle of coding R with Java, or some other language, in order to have it run on the Hadoop platform.

In addition to supporting these new platforms, RRE7 also features a number of new algorithms and processes. One is a collection of models for setting up Decision Forests, a machine learning technique for predicting future outcomes. A new batch of Stepwise Regression functionalities can help automate the process of selecting the most important variables to be used in a predictive model. A new Decision Tree visualization can provide a graphical way for depicting complex relationships and correlations within a set of data.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags softwareapplicationsdata miningRevolution Analytics

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Breitling Superocean Heritage Chronographe 44

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?