In Pictures: 18 essential Hadoop tools for crunching big data

Making the most of this powerful MapReduce platform means mastering a vibrant ecosystem of quickly evolving code

In Pictures: 18 essential Hadoop tools for crunching big data prev next

Loading...

Avro When Hadoop jobs need to share data, they can use any database. Avro is a serialization system that bundles the data together with a schema for understanding it. Each packet comes with a JSON data structure explaining how the data can be parsed. This header specifies the structure for the data up top avoiding the need to write out extra tags in the data to mark the fields. The result can be considerably more compact than traditional formats, such as XML or JSON, when the data is regular.

The illustration shows an Avro schema for a file with three different fields: name, favorite number, and favorite color.

Avro is another Apache project with APIs and code in Java, C++, Python, and other languages at http://avro.apache.org.

Prev Next 13/19

Comments on this image

Close

In Pictures: 18 essential Hadoop tools for crunching big data

19 images
Shopping.com

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?