Neo Aims to Reorder the Business World with Graph Databases
(Page 2 of 3)
a Web-based enterprise content management system, or CMS.
Soon enough, Eifrem became Windh’s chief technology officer, in charge of improving the company’s core database. And that’s where his exposure to big databases began. Hundreds of companies were using the system, each with their own groups and subgroups. Each group owned an array of documents, which had to be accessible to people in some groups but not others. It was all built on a standard Oracle relational database, where everything is stored in tables of rows and columns.
By 2000, Eifrem says, trying to keep track of all the hierarchies and permissions had turned into “a big mess.” So he ended up building a layer on top of the Oracle database “to shield us from the database and abstract all of this into nodes and relationships,” also called edges—the key elements of a graph database.
The scheme worked, but translating queries from the graph layer to the relational layer caused enough new difficulties that the Windh team was soon pining for a “native” graph database where the documents and their properties could be stored directly as nodes and relationships. So they built one, and then rebuilt it. By 2003, Windh’s whole CMS was running on top of one of the world’s first native graph databases. They called it the NEO Node Space Engine.
In the rush of that accomplishment, the Windh team wanted to tell everyone about NEO. “We were young and arrogant and said shit like ‘the world deserves this,’” Eifrem recalls. Unfortunately, 2003 was a bad time to be promoting a new kind of database. Not only was the tech world hung over from the 2001 crash, but there was also widespread cynicism in the computing community after object-oriented databases, a previous alternative to the relational database, had failed to live up to dot-com-era hype.
“There was zero acceptance to bringing a new database into the market,” Eifrem says. So the Windh team went back to the CMS business and “honed our skills, polished the database, and learned more about how to build applications that use it.”
But by 2007 or 2008, some of the new database approaches being pioneered inside companies like Amazon, Google, Yahoo, and Facebook had begun to attract the attention of developers. The huge collections of information on user behavior that these businesses were generating began to be described as “Big Data,” and a lot of this data was going into a new generation of non-relational, NoSQL databases like Amazon’s Dynamo, Google’s BigTable, Facebook’s Cassandra, and LinkedIn’s Voldemort.
To understand where Windh’s technology fit in, a crash course on the various families of NoSQL databases is needed. First there’s the “key-value store,” where data is stored in tall, skinny tables consisting of just two columns—a key and a value. Dynamo and Voldemort belong to this family; key-value stores are especially good at holding simple data.
Then there’s the column family, inspired by Google’s BigTable. In a column database, data is recorded primarily in columns rather than rows. Each row can have a different number of columns, which means column databases are good for holding data with varying amounts of structure. Cassandra and the Hadoop Hbase database are column databases.
Document databases are the third type of NoSQL database. They have no tables, rows, or columns. They’re just collections of documents, each with an arbitrary number of fields. They’re great for storing and retrieving variegated data like, well, documents. I profiled 10gen, a Palo Alto company that promotes the MongoDB document database, in September 2011.
Finally there are graph databases, which are best for storing interconnected elements where the types of connections might change over time (making it impossible to define a fixed scheme of rows and columns). Google has a graph database called Pregel, as well as the Knowledge Graph, and Twitter’s FlockDB acts like a graph database, though it’s actually a MySQL database under the hood.
Eifrem felt that Windh’s database had advantages that other types of NoSQL databases didn’t, especially when it came to handling highly complex data with lots of embedded relationships. NoSQL databases are built to perform well at large scale, but the more complex the data, the less easily they scale up, Eifrem says. Only graph databases still perform well at scale when the data is complex, he argues.
“If you put data into a key-value store, it will be easier to get to scale [across many machines], but then you have chopped it up into these small pieces,” he says. “Whereas a graph database says, ‘Fuck it, the world is connected, let’s embrace that and allow you to express your domain in as rich a way as possible.’”
In 2009, Windh spun out its NEO technology as an open-source database standard called Neo4j, and Eifrem set up Neo Technologies to sell it. A Series A investment from Fidelity in 2011 gave the startup the resources it needed to move from Sweden to San Mateo.
In contrast to companies like Red Hat or 10gen, which make money on consulting and support around open-source software, Neo owns the intellectual property beyond Neo4j and is the exclusive contributor to … Next Page »