The National Security Agency has submitted new label-based data store software, called Accumulo, to the Apache Software Foundation, in hopes that other parties will further develop the technology for use in secure systems.
"There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels," the developers stated on the proposal page submitted to Apache. "We have made much progress in developing this project over the past [three] years and believe both the project and the interested communities would benefit from this work being openly available and having open development."
Based on Google BigTable design, Accumulo is a simple key/value data store, where providing the system with the key will return the data associated with that key. A distributed design, Accumulo can be run across multiple servers, making it a candidate for use in big data systems.
Plenty of NoSQL-based key/value data stores already exist, such as Cassandra and HBase. What sets Accumulo apart is the ability to tag each data cell with a label. Each key has a section called column visibility, which can store labels. The labels could be used to allow fine-grained access to the data, where an external server may access some cells of the data store, but not others, based on policy rules set in place and defined by a set of labels.
Read Big Data - Part 1.
"The access labels in Accumulo do not in themselves provide a complete security solution, but are a mechanism for labeling each piece of data with the authorizations that are necessary to see it," the proposal stated.
Such label-based data storage could be the basis of secure data store-based systems, ones that could be used by health care, government agencies and other parties with stringent security and privacy requirements, the developers state.
NSA's label-based approach to security resembles another open source project NSA developed and released in 2000, called Security Enhanced Linux (SE Linux). With SE Linux, administrators can create policies that dictate, in fine-grained detail, what actions each program on a computer can execute. Red Hat has integrated SE Linux into its Red Hat Enterprise Linux distribution.
Already the software was attracted "hundreds of developers," using the database, primarily within the NSA, according to the agency. The software itself has about 200,000 lines of code, most based on Java. In addition to the code, NSA pledges to post examples, documentation and training materials on the Apache site.
The agency wants to build a wider base of both contributors and users.
The Apache Incubator is the entry point for new projects that developers hope to have Apache manage. Accumulo runs on top of a number of other Apache programs, namely the Hadoop distributed data platform, the Zookeeper distributed application configuration manager, and the Thrift services development tool.