Innovation Outlook for 2015: Tackling Big Data Variety


Xconomy Boston — 

In 2015, look for the innovation community to be talking about Big Data Variety: the problem, how to fix it, how to make money off of it.

Companies have invested roughly $3-4 trillion on enterprise software over the last 20 years, with Gartner forecasting $320 billion in 2014 alone. A lot of that investment has gone into single systems and applications—from Oracle and SAP to proprietary enterprise resource planning and product lifecycle management systems to (more recently) Hadoop and Hive. The good news: organizations are sitting on vast reserves of diverse, potentially invaluable corporate data. The bad news: they can’t get at a lot of the value because it’s locked in data silos tied to these systems, applications, organizations, individuals, or all of the above.

Welcome to the world of Big Data Variety. Organizations now want to use this data broadly—along with all external data sources—for analytic applications. But most organizations don’t even know what they have—data sources, entities, and attributes—let alone how to get them to work together at scale to power new insights and discover long-tail business opportunities.

Meanwhile, companies are also investing heavily in big data, which Gartner estimates at $44 billion in 2014. Yet today 85 percent of that big data investment is going toward IT services, not software. In an HBR post, Mahesh S. Kumar wrote that “the disproportionate spending on services is a sign of immaturity in how we manage data,” citing Marc Andreessen’s seminal argument that for each new technology wave, the money eventually shifts to software.

Opportunity is knocking: Clearly, we need innovation in software that radically improves the connection, enrichment, and management of the full volume and variety of an enterprise’s data sources. Most of the high-profile software innovation so far (for example, Hortonworks/Hadoop) has targeted storing and aggregating data. The nastier problem—and the bigger opportunity—by far is connecting data silos semantically at scale, shortening the time to analytics, and discovering the data in an enterprise that can dramatically improve signals in predictive models.

This isn’t a problem that will be solved overnight, and it’s going to get worse for businesses (almost every investment in a new, single-vendor system creates a new data silo). And the cultural changes may be the biggest challenge: realizing that the solution is NOT to throw more IT people or consultants at them. Or even to throw data scientists—the new unicorns/rock stars—at them.

When you think of it, our future depends on the ability to harness Big Data Variety. We need to be able to quickly ask—and answer—big questions. Questions ranging from “How can I get the best price (or the most uninterruptible supply source) on an essential part from my global supply chain?” to “Which of my 8,000 research chemists is furthest along working on a molecule that could accelerate a cure for _____?”

A year from now, I think we will look back at some excellent progress here.

[Editor’s note: To tap the wisdom of our distinguished group of Xconomists, we asked a few of them to answer this question heading into 2015: “What will everyone in the innovation community be talking about a year from now?” You can see other questions and answers here.]

By posting a comment, you agree to our terms and conditions.

3 responses to “Innovation Outlook for 2015: Tackling Big Data Variety”

  1. DataH says:

    Andy, Variety is definitely one of the more important V words when considering a big data strategy. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and can help companies derive actionable insights from their data. It is a mature platform and provides for a data delivery engine together with a data transformation and linking system. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at

  2. Personal Search & Elearning

    Anybody discussing Education should check this app out.

    It’s quick to archive and fast forward through video.

    This app can randomly select “groupings” of Large text, video segments, pictures and audio. What a great question / answer tool.
    When learning and retaining knowledge is this easy, great things happens. I know…

    I use it to validate my video files. Sometimes they get corrupted without me knowing. The app can play a short random segment of all the videos I have on file.

    Give your brain a poke now and again by randomly sampling of your digital archive – learning sans memorization

    When you have data access this great, then sharing it is a snap.

    Nobody shares knowledge better than this

    Doug Pederson

  3. Margaux Dela Cruz says:

    One of the good points brought by the article is that it addresses the challenges of many organizations on how they handle company data. It turned out that trillions of dollars investing in different software to understand data are not enough to grasp the information and how data value can be used to run the companies.