Big Data Digest: How many Hadoops do we need?

This week in Big Data news, brings a new data processing framework, and computers that are more intimate with your feelings

Managing big data

Managing big data

Say hello to Flink, the newest distributed data analysis engine on the scene.

This week, the Apache Software Foundation announced Apache Flink as its newest Top-Level Project (TLP). Apache also provides a home for Hadoop, Cassandra, Lucene and many widely used open source data processing tools, so Flink's entry into the group speaks well for its technical chops.

Don't worry if you hadn't heard of Flink before -- it came to a surprise to us as well. Like Spark, another emerging data processing platform, Flink can ingest both batch data and streaming data. Apache Flink got its start as a research project at the Technical University of Berlin in 2009.

Why would someone choose Flink over Hadoop? Performance and ease of use, say the creators of the software.

The Flink engine exploits data streaming and in-memory processing to improve processing speed, said Kostas Tzoumas, a contributor to the project. Tzoumas is cofounder and CEO of data Artisans, a spin-off company that will commercialize Flink. It could serve as an ideal replacement for Hadoop for those who want faster performance.

Another advantage Flink offers is ease of use, Tzoumas said. Especially for large projects, the APIs (application programming interfaces) are an "order of magnitude" easier to use than programming for Hadoop's MapReduce, according to Tzoumas. APIs are provided for Java and Scala.

Music streaming service Spotify and travel software provider Amadeus are both testing the software, and it's been pressed into production at ResearchGate, a social network for scientists.

Nonetheless, with Hadoop and Spark growing in popularity, Flink may face an uphill battle when it comes gaining users.

"Projects that depend on smart optimizers rarely work well in real life," wrote Curt Monash, head of IT analyst consultancy Monash Research, in an e-mail. He pointed to other projects relying on performance enhancing tweaks that failed to gain traction, such as IBM Learning Optimizer for DB2, and HP's NeoView data warehouse appliance.

Elsewhere, researchers at the Massachusetts Institute of Technology (MIT) are looking at ways to use data to help better plan routine tasks such as scheduling flights or helping mapping software find the best route through a crowded city.

Later this month, MIT researchers will present a set of mew algorithms at the annual meeting of the Association for the Advancement of Artificial Intelligence (AAAI) that can plot the best route through a set of constraints.

Unlike current software that does this -- think automated airline reservation systems -- these algorithms can assess risk. For someone looking to get across town on a number of busses, it can weigh how often those busses are late and suggest alternatives where they make sense. The work is rooted in graph theory, which focuses on connections across multiple entities.

Speaking of graphs, database company Neo Technology got some press this week for attracting US$20 million in funding to help get its Neo4j graph database out into the enterprise market. Once largely an academic concern, graph databases are finally being used in production environments. Neo4J is used by Walmart, eBay, CenturyLink, Cisco and the Medium publishing platform, GigaOm reported.

While we think of computers as number crunchers, researchers are increasingly looking at ways they can work with the most slippery of data, human emotions.

This week's New Yorker magazine has an article on a number of startups developing technology that can help computers read human emotions. It is a surprisingly robust field.

Author Raffi Khatchadourian tracks the history of one such company, Affectiva. Its software scans a face, identifying the main features (eyes, nose, eyebrows), and notes how the more movable parts of the face (the lips) change over time. Affectiva has built a huge database of facial expressions which can be used by its software to identify the emotional state of the user -- be it happy, sad, confused or any one of dozens of other emotional states.

Naturally, advertising agencies and television networks are interested in any technology that can get a better read on humans. Verizon, for instance, once had plans for a media console that could track the activities of everyone in the room.

"All this data would then shape the console's choice of TV ads," Khatchadourian wrote. "A marital fight might prompt an ad for a counsellor. Signs of stress might prompt ads for aromatherapy candles. Upbeat humming might prompt ads 'configured to target happy people.' The system could then broadcast the ads to every device in the room."

Those worried how this software could be used by marketers to badger consumers in ever more intrusive ways can at least take heart that it could also be used in less mercenary ways. Affectiva CEO Rana el Kaliouby, long a student of what she calls "affective computing," was initially drawn to the possibilities of using the software as an "emotional hearing aid" to help autistic children better communicate with the world.

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags applicationsdata miningsoftwareApache Software Foundation

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?