How Apache Kafka is greasing the wheels for big data

IBM just used it to launch two new Bluemix services

Analytics is often described as one of the biggest challenges associated with big data, but even before that step can happen, data has to be ingested and made available to enterprise users. That's where Apache Kafka comes in.

Originally developed at LinkedIn, Kafka is an open-source system for managing real-time streams of data from websites, applications and sensors.

Essentially, it acts as a sort of enterprise "central nervous system" that collects high-volume data about things like user activity, logs, application metrics, stock tickers and device instrumentation, for example, and makes it available as a real-time stream for consumption by enterprise users.

Kafka is often compared to technologies like ActiveMQ or RabbitMQ for on-premises implementations, or with Amazon Web Services' Kinesis for cloud customers, said Stephen O'Grady, a co-founder and principal analyst with RedMonk.

"It's becoming more visible because it's a high-quality open-source project, but also because its ability to handle high-velocity streams of information is increasingly in demand for usage in servicing workloads like IoT, among others," O'Grady added.

Since being conceived at LinkedIn, Kafka has gained high-profile support from companies such as Netflix, Uber, Cisco and Goldman Sachs. On Friday, it got a fresh boost from IBM, which announced the availability of two new Kafka-based services through its Bluemix platform.

IBM's new Streaming Analytics service aims to analyze millions of events per second for sub-millisecond response times and instant decision-making. IBM Message Hub, now in beta, provides scalable, distributed, high-throughput, asynchronous messaging for cloud applications, with the option of using a REST or Apache Kafka API (application programming interface) to communicate with other applications.

Kafka was open-sourced in 2011. Last year, three of Kafka's creators launched Confluent, a startup dedicated to helping enterprises use it in production at scale.

"During our explosive growth phase at LinkedIn, we could not keep up with the growing user base and the data that could be used to help us improve the user experience," said Neha Narkhede, one of Kafka's creators and Confluent's co-founders.

"What Kafka allows you to do is move data across the company and make it available as a continuously free-flowing stream within seconds to people who need to make use of it," Narkhede explained. "And it does that at scale."

The impact at LinkedIn was "transformational," she said. Today, LinkedIn remains the largest Kafka deployment in production; it exceeds 1.1 trillion messages per day.

Confluent, meanwhile, offers advanced management software by subscription to help large companies run Kafka for production systems. Among its customers are a major big-box retailer and "one of the largest credit-card issuers in the United States," Narkhede said.

The latter is using the technology for real-time fraud protection, she said.

Kafka is "an incredibly fast messaging bus" that's good at helping to integrate lots of different types of data quickly, said Jason Stamper, an analyst with 451 Research. "That’s why it’s emerging as one of the most popular choices."

Besides ActiveMQ and RabbitMQ, another product offering similar functionality is Apache Flume, he noted; Storm and Spark Streaming are similar in many ways as well.

In the commercial space, Confluent's competitors include IBM InfoSphere Streams, Informatica’s Ultra Messaging Streaming Edition and SAS’s Event Stream Processing Engine (ESP) along with Software AG's Apama, Tibco's StreamBase and SAP's Aleri, Stamper added. Smaller competitors include DataTorrent, Splunk, Loggly, Logentries, X15 Software, Sumo Logic and Glassbeam.

In the cloud, AWS's Kinesis stream-processing service "has the added benefit of integration with the likes of its Redshift data warehouse and S3 storage platform," he said.

Teradata's newly announced Listener is another contender, and it's Kafka-based as well, noted Brian Hopkins, a vice president and principal analyst with Forrester Research.

In general, there's a marked trend toward real-time data, Hopkins said.

Up until 2013 or so, "big data was all about massive quantities of data stuffed into Hadoop," he said. "Now, if you're not doing that, you're already behind the power curve."

Today, data from smartphones and other sources are giving enterprises the opportunity to engage with consumers in real time and provide contextual experiences, he said. That, in turn, rests on the ability to understand data faster.

"The Internet of Things is like a second wave of mobile," Hopkins explained. "Every vendor is positioning for an avalanche of data."

As a result, technology is adapting accordingly.

"Up to 2014 it was all about Hadoop, then it was Spark," he said. "Now, it's Hadoop, Spark and Kafka. These are three equal peers in the data-ingestion pipeline in this modern analytic architecture."

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags amazon.comIBM

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Katherine Noyes

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?