With $10M, SpaceCurve Readies ‘Third Generation’ Big Data Platform
SpaceCurve says its big data platform removes the barriers to real-time analysis of enormous, multi-dimensional datasets inherent in current technologies. The Seattle startup has raised $10 million to begin marketing its platform, as investors continue to bet on locally based big data companies.
Triage Ventures led the Series B funding, with earlier SpaceCurve investors Reed Elsevier Ventures and Divergent Ventures participating. The company has raised $16.2 million to date. Investors have poured upwards of $325 million into Seattle-area big data companies in the last three months alone, though the vast majority of that came through Tableau Software’s May IPO.
CEO John Slitz describes SpaceCurve’s platform—scheduled for release in the third quarter—as the “third generation” of big data technologies, one designed from the start for the accelerating volumes and varieties of data thrown off by modern sensor networks, mobile devices, and online activities.
As Slitz views it, the first generation of big data, in the mid-1990s, was known as data warehousing, in which data was accumulated and stored in relational databases separate from specific applications and used in a variety of ways.
The second generation emerged in the mid-2000s as a response to the growing data needs of search engine companies. This generation includes technologies such as BigTable, a database developed by Google for Web indexing and other data-heavy tasks, and described as a “persistent multi-dimensional sorted map” [PDF]; MapReduce, which allows more efficient processing of applications using petabyte-scale datasets by breaking computing tasks into small chunks that can be worked on by commodity hardware running in parallel; and Hadoop, an open-source derivation of MapReduce, that powers major Web properties such as Facebook.
Since those technologies were conceived, smartphones have proliferated, equipment of all kinds has been outfitted with sensors, and people have filled the world with their tweets, Facebook posts, and other digital footprints. Big data today is about handling this expansion of data in three dimensions: velocity, volume, and variety.
“Now we’re seeing that this data is all over the place,” Slitz says, “but it’s not easy to take advantage of because the basic tools we had to build and work with stuff before were not designed or even thought to be able to run at those speeds and those sizes.”
Slitz talks about the idea of “perishable data,” which he describes as data that has “the predominance of its potential value in a context.”
“Unless you can quickly understand that context and make an intelligent decision on that, you really aren’t doing very much with it,” he says.
For example, an airline manufacturer outfits its planes with hundreds of sensors providing constant streams of information on things like temperature, pressure, humidity, and stresses on various parts and systems. By analyzing historical data from those sensors, the manufacturer can calculate an optimized flight setup for any given set of weather conditions.
If that data could be gathered and evaluated in real time, and compared with real-time atmospheric conditions around the plane, the pilot could be given recommendations for the most economical flight.
“I not only have to understand the readings from all the sensors, but I have to understand them within a context, and I have to provide an actionable analysis based upon context and the data that I have collected right now,” Slitz says. “If I can do that, I can change the flight characteristics of that plane for economy, for smoothness, or any number of things.”
SpaceCurve’s data platform is designed to accept data at a high rate—in the millions of records per second—and processes it continuously, rather than in batches. It makes the data available in a form that can be immediately analyzed, he says.
For example, metadata from satellite images—such as latitude and longitude, continent, time, and other descriptions—is pre-assigned memory space as it’s being ingested by the system.
“We not only recognize something as it’s coming in, but we know where we’re going to store it and we can immediately begin running queries against it,” Slitz says. “You get a real-time loop of actionable intelligence. … That is a huge breakthrough in the way in which you can handle data.”
The company says its technology “employs a unique, pattern-based approach” and “a distinct method for indexing geospatial polygons that provide it with unmatched scalability.”
SpaceCurve founder and CTO Andrew Rogers spent nearly two decades as “a consultant and troubleshooter on large database problems,” Slitz says. While working on Google Earth, he saw that existing data systems would struggle to handle the coming multi-dimensional data flow, and later set to work on what would become SpaceCurve.
In 2010, Rogers and a small group of engineers began proving and refining this new data architecture. “What we have done is really not a trivial amount of engineering,” says Slitz, who became SpaceCurve CEO in late 2011. “It’s real, honest-to-God invention.”
SpaceCurve positions its product as a general data store optimized for geospatial or location data, making Oracle Spatial and Graph and open-source PostGIS its most obvious competitors. The company will also have to convince potential customers of the value of its solution over more-established big data platforms, such as Cloudera, which bases its offering on Hadoop.
Slitz says SpaceCurve is in discussions with 50 potential customers—mainly large enterprises already working with huge amounts of data—a dozen of which he considers most likely to purchase the technology.
SpaceCurve’s platform is designed to run on computer clusters with at least 1,000 nodes, and not in a public cloud because it takes too long to move datasets this large. “It’s like draining Lake Union into the Pacific with a garden hose,” Slitz says.
The company has 17 employees and intends to grow to more than 40 by March of next year, which Slitz acknowledges will be a tall order given the competition for engineering talent, particularly people with skills that can be applied to big data.
Investors have bet on other Seattle-area companies working on various aspects of big data; nearly all of them are hunting for people with similar skills.
In the last three months alone, data visualization company Tableau Software raised $193.1 million before expenses in its May IPO (underwriters and insiders sold shares worth an additional $99.2 million); GraphLab, a startup commercializing an open-source technology for swiftly analyzing graph datasets, raised $6.75 million in venture capital; VoloMetrix raised $3.3 million to apply big data and social analytics to work prioritization and collaboration; Indix attracted $4.5 million to amass a huge index of product prices and information to help product managers; and Decide.com raised $8 million to expand its consumer-focused price prediction service to a broader array of product categories.