Databricks takes the human intervention out of Spark processing

A new workflow feature for Databricks Cloud can automate routine deployment tasks

Databrick's now offers a way to schedule Spark jobs in the cloud

Databrick's now offers a way to schedule Spark jobs in the cloud

Databricks wants to make it possible to take humans out of the loop entirely when it comes to running complicated data analysis jobs.

The company, which offers a commercial version of Spark , now offers a tool to automate the setting up and executing of analysis written to run on the open source data processing platform.

"You can express very complicated workflows using this thing," said Ali Ghodsi, Databricks' director of engineering. "There is no human in the loop any more."

Founded by several of the original developers of Spark, Databricks offers a commercial version of the platform designed to run on Amazon Web Services and eliminate many of the mundane chores of setting up and maintaining an in-house deployment.

Spark can be used to analyze very large data sets across multiple servers for tasks such as generating recommendations for an Internet service for users, or to predict future revenue of a company.

As customers get more comfortable with using big data, they are increasingly scheduling their analysis jobs to run on a regular basis, requiring an administrator to log into a console to coordinate all the steps needed to run the job.

The new feature for Databricks Cloud, called jobs, provides a way for administrators to set up schedules to run standalone Spark jobs at specified intervals. A user could schedule a Spark application to run on a specific Databricks cloud cluster at a scheduled time. Users can decide whether to use a dedicated cluster for maximum performance, or a cluster shared with other users to save money.

The service notifies the user when the task completes. The service also creates a log detailing if the task was completed successfully or not, and can alert the administrator if something goes awry.

In effect, the feature establishes a way to create a production pipeline, which is a series of jobs that execute automatically and in coordination with each other. An administrator can set up a workflow that executes two Spark jobs at the same time, and wait for both to finish. When both are completed, the workflow can then start another job that uses the results from the first two. If one of the two initial jobs fail, then the entire workflow can be terminated.

Jobs are written in Spark notebooks. Similar to iPython notebooks for Python, Spark notebooks are user-generated packages that contain all the components needed to run an interactive data analysis job across a cluster. Spark Notebooks can be written in Python, Scala, SQL, or a combination of each.

Pricing for Databricks is tiered, based on usage capacity, support model, and feature-set. It will start at several hundred dollars per month.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the Good Gear Guide newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags applicationsDatabricksdata miningsoftware

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Matthew Stivala

HP OfficeJet 250 Mobile Printer

The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.

Armand Abogado

HP OfficeJet 250 Mobile Printer

Wireless printing from my iPhone was also a handy feature, the whole experience was quick and seamless with no setup requirements - accessed through the default iOS printing menu options.

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?