Event shows the many faces, challenges of big data

New applications emerge but changing skills demands and still-growing technology stand in the way

The burgeoning tech industry movement around big data is churning up a variety of new applications, but remains an evolving field that faces lingering challenges, judging from an event held Wednesday at a Microsoft research facility in Cambridge, Massachusetts.

Big data -- or Big Data as some in the industry call it -- refers to the ever-growing quantity and variety of data, particularly in unstructured form, being generated by websites, sensors, social media and other sources, as well as a growing array of technologies aimed at deriving insights from it.

Startup Recorded Future seeks to perform "temporal analysis" of information found in the public Web, said Christopher Ahlberg, CEO and co-founder, during a panel discussion at the Massachusetts Technology Leadership Council's Big Data Disruption event.

Recorded Future's system taps into some 70,000 sources, including news sites, trade publications, blogs and financial databases, sifting through the information and identifying references to individual entities and events, Ahlberg said. "We're ingesting 100,000 to 300,000 documents every hour."

Publication dates and other time-related bits of information are associated with the references, allowing them to be organized in a historical manner. Then they are analyzed for sentiment and tone.

Recorded Future's capabilities are used by defense agencies, financial services firms and competitive intelligence experts. The system can be used to pinpoint "broad signals," such as regarding the potential rise or fall of a stock, or "fine-grained alerting" of a specific type of news event, Ahlberg said.

Startup DataXu, which was also featured at Wednesday's event, offers analytics meant to help digital marketing executives. Its software analyzes data derived from tracking pixels embedded in online ads and builds predictive models showing which types of ad impressions are most like to lead to sales, said CTO Bill Simmons during another panel talk. DataXu's customers "want to change the minds of consumers and build a brand," he said. To do so, they may need to show an advertising message 100 times, but "where do you show it," he added.

DataXu is applying machine learning "to a very imbalanced problem," given that today, thousands of ad impressions may lead to only one person buying anything, Simmons added. His company also has to make its service more cost-effective than simply buying and running ads at saturation levels, he said.

Many speakers on Wednesday referred to their companies' use of one the most closely associated technologies with big data, Hadoop, an open-source programming framework that allows users to split up large processing jobs and run them in parallel across clusters of servers.

But Hadoop in its current form has serious limitations, said Michael Stonebraker, a Massachusetts Institute of Technology professor and founder of a number of database vendors. He was also the primary architect for the Ingres and Postgres database systems and is currently CTO of VoltDB.

For one, it "has terrible performance on data management," he said. In addition, Hadoop is a low-level interface that requires people to program in Java, Stonebraker said. "Forty years of research says high-level languages are good."

The problems Stonebraker cited could be mitigated over time, however, given that an array of vendors have been rolling out various tools meant to make Hadoop easier to use.

Meanwhile, EMC's Greenplum division is "building a platform for the future of big data," said George Radford, field CTO, during a panel discussion. That includes both row-based and columnar stores, integrated Hadoop storage, and integration with the Gemfire in-memory data grid for in-memory analytics, he said. This integration is crucial, according to Radford. "One of the problems with point solutions is with big data, the last thing you want to do it move it. You want to ingest it and analyze it in place."

But a new problem for big data is emerging even as companies like EMC Greenplum make these technological strides, Radford added. "Like everyone else here, we're looking for data scientists. As we solve the platform issues, people are going to be transformed from bit-tweakers and tuners to active partners with the business."

At another point, talk turned to big data's relationship with cloud computing, particularly public infrastructure offerings like Amazon Web Services, which offer raw compute power for developers.

Such systems present "an extremely challenging environment" for big data processing given the limited control users ultimately have over factors like the underlying network and storage, said Fritz Knabe, distinguished engineer at IBM's Netezza division.

But the public cloud does make sense for large processing jobs in some cases, Stonebraker said. "If you are doing month-end reporting and you need 1,000 processors for three hours, go ahead and do that on the [public] cloud. There's some low-hanging fruit."

Chris Kanaracus covers enterprise software and general technology breaking news for The IDG News Service. Chris's e-mail address is Chris_Kanaracus@idg.com

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Chris Kanaracus

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Tom Pope

Dynabook Portégé X30L-G

Ultimately this laptop has achieved everything I would hope for in a laptop for work, while fitting that into a form factor and weight that is remarkable.

Tom Sellers


This smart laptop was enjoyable to use and great to work on – creating content was super simple.

Lolita Wang


It really doesn’t get more “gaming laptop” than this.

Jack Jeffries


As the Maserati or BMW of laptops, it would fit perfectly in the hands of a professional needing firepower under the hood, sophistication and class on the surface, and gaming prowess (sports mode if you will) in between.

Taylor Carr


The MSI PS63 is an amazing laptop and I would definitely consider buying one in the future.

Christopher Low

Brother RJ-4230B

This small mobile printer is exactly what I need for invoicing and other jobs such as sending fellow tradesman details or step-by-step instructions that I can easily print off from my phone or the Web.

Featured Content

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?