Databricks takes the human intervention out of Spark processing

A new workflow feature for Databricks Cloud can automate routine deployment tasks

Databrick's now offers a way to schedule Spark jobs in the cloud

Databrick's now offers a way to schedule Spark jobs in the cloud

Databricks wants to make it possible to take humans out of the loop entirely when it comes to running complicated data analysis jobs.

The company, which offers a commercial version of Spark , now offers a tool to automate the setting up and executing of analysis written to run on the open source data processing platform.

"You can express very complicated workflows using this thing," said Ali Ghodsi, Databricks' director of engineering. "There is no human in the loop any more."

Founded by several of the original developers of Spark, Databricks offers a commercial version of the platform designed to run on Amazon Web Services and eliminate many of the mundane chores of setting up and maintaining an in-house deployment.

Spark can be used to analyze very large data sets across multiple servers for tasks such as generating recommendations for an Internet service for users, or to predict future revenue of a company.

As customers get more comfortable with using big data, they are increasingly scheduling their analysis jobs to run on a regular basis, requiring an administrator to log into a console to coordinate all the steps needed to run the job.

The new feature for Databricks Cloud, called jobs, provides a way for administrators to set up schedules to run standalone Spark jobs at specified intervals. A user could schedule a Spark application to run on a specific Databricks cloud cluster at a scheduled time. Users can decide whether to use a dedicated cluster for maximum performance, or a cluster shared with other users to save money.

The service notifies the user when the task completes. The service also creates a log detailing if the task was completed successfully or not, and can alert the administrator if something goes awry.

In effect, the feature establishes a way to create a production pipeline, which is a series of jobs that execute automatically and in coordination with each other. An administrator can set up a workflow that executes two Spark jobs at the same time, and wait for both to finish. When both are completed, the workflow can then start another job that uses the results from the first two. If one of the two initial jobs fail, then the entire workflow can be terminated.

Jobs are written in Spark notebooks. Similar to iPython notebooks for Python, Spark notebooks are user-generated packages that contain all the components needed to run an interactive data analysis job across a cluster. Spark Notebooks can be written in Python, Scala, SQL, or a combination of each.

Pricing for Databricks is tiered, based on usage capacity, support model, and feature-set. It will start at several hundred dollars per month.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags applicationsDatabricksdata miningsoftware

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?