Madrona, NEA Bet $6.75M on Seattle Big Data Analysis Startup GraphLab
When Seattle’s tech community pulled together last year to help recruit Carlos Guestrin, a standout machine-learning expert and data scientist, to the University of Washington, some people were hoping a hot startup would wind up here as well.
They weren’t disappointed. Guestrin is launching GraphLab Inc. with a $6.75 million investment led by Madrona Venture Group and New Enterprise Associates (NEA). The company is commercializing an open source technology for analyzing enormous, complex datasets.
The GraphLab technology was born five years ago at Carnegie Mellon University, where Guestrin and his group were working on large-scale machine learning algorithms to extract and analyze relationships between entities in multi-dimensional graph datasets.
Different from structured relational databases, these modern datasets consist of nodes that represent objects, and edges representing relationships among them. Today, graph datasets may include hundreds of billions of objects, and power things like social networks, online reviews, and recommendation engines. Facebook, for example, is a huge graph dataset in which the nodes are people, pictures, companies, and other entities, and the edges are the “friendships,” likes, and tags that link them together.
“What’s challenging is how you interact with this data at this scale in a fast way,” Guestrin says.
At Carnegie Mellon, his team ran up against the limits of existing software and “built a little system” of their own—GraphLab—to push the state of the art, he says.
“We threw it out into the open source community, kind of as an afterthought,” Guestrin says.
With no marketing other than academic talks, GraphLab has been downloaded tens of thousands of times, and benefitted from the active engagement of major players in the technology industry, Guestrin says.
GraphLab can grind through graph datasets “orders of magnitude faster than any system out there,” he says. That’s because the underlying machine-learning algorithm is optimized to understand and exploit the structure of a graph database, leading to faster, more accurate analysis, Guestrin explains.
The next step is to make it easier to use for people outside the data science priesthood.
“How can I make the algorithm so robust, so simple to use, yet so accessible and valuable that a company with minimal headcount in this area could get the same kind of value that a company like Google that has hired a huge number of people with this area of expertise can,” he says.
Last year, Guestrin was one of a quartet of high-profile hires by the UW computer science department after a recruiting push that included personal entreaties from Microsoft Research vice president Peter Lee and Amazon CEO Jeff Bezos, and the creation of the Amazon Endowed Professorships in Machine Learning for Guestrin and his wife, Emily Fox, now an assistant professor in the UW statistics department.
“It’s a great example of the triangulation between university research, commercial leadership, and our role in encouraging that… and then helping to bring together this nearly $7 million financing round,” says Matt McIlwain, managing director with Madrona, who is joining the board of GraphLab Inc.
Asked if there was a sense in the tech community that by recruiting Guestrin, Seattle was also recruiting a would-be startup company based on GraphLab, McIlwain says: “We certainly were aware of that potential and were hopeful that something would come together over time. Carlos is a major talent from a computer science perspective, but also one of those special people with those great skills and natural entrepreneurial knack.”
If GraphLab was born at Carnegie Mellon and raised by the open source community, it went to finishing school on Montlake. Guestrin says the technology has benefitted from “a tremendous amount of engagement, contribution, and value from the UW community.”
“As individuals, we are our relationships and our connections to people, and what we do with them,” Guestrin says, perhaps revealing a bit of the graph dataset philosophy.
Since moving to Seattle last August, Guestrin has been talking to GraphLab users, colleagues, and investors about how to push the technology further than it could go as an open source offering from a university, he says.
He has been transitioning his efforts to form a stand-alone company in the last few months.
“GraphLab is my dream,” he says.
Guestrin’s decision to launch a business based on the technology is motivated in part by his desire to make GraphLab sustainable—something he’s not certain it can be solely with support in the open source community.
“There have been other companies that have been able to maintain a self-sustaining, effective effort in the open source community, but also pay the salaries of the people involved,” he says.
The Mozilla Foundation would be one obvious example—albeit a nonprofit one.
“GraphLab, given the level of engagement, the number of people, and the type of engagement—it just can’t be done by a few students in an academic lab,” he says. “We need a larger number of people involved in order to continue to make it valuable and effective.”
He’s aiming for a headcount “in the teens” and has made hires from academia and “from top companies in the Seattle area.” The company is working from incubator space in Fluke Hall on the UW campus.
(The lab in GraphLab, by the way, is for both laboratory, and the Labrador retriever that was Guestrin’s companion when the technology was conceived. “There’s a genesis story for names that may or may not be true,” Guestrin quips.)
Seattle is the right place to build a big data company with heavy academic underpinnings, he says, because of the active connections between entrepreneurs and research at the UW. “That brought me a lot of good mentorship, advice, connections, and collaborations,” he says.
How will he balance academic responsibilities with being a startup CEO? “There’s inspiring examples of how that can be done with Oren Etzioni and Dan Weld and others,” Guestrin says, referring to other UW professor-entrepreneurs.
“Beyond that, Seattle as a whole is an exciting place to be doing this kind of thing,” he says, pointing to the leadership in cloud computing infrastructure from the big three of Amazon, Microsoft, and Google.
Madrona’s McIlwain notes the expertise—commercial and academic—accumulating in Seattle around big data.
“We think there’s these horizontal plays like GraphLab, Context Relevant, and Tableau, but that also all innovative companies need to be data analysis-driven.”
As a commercial enterprise, GraphLab Inc., will work on a broader, more robust platform for analysis of enormous graph datasets, and may provide bespoke solutions for individual customers, Guestrin says. He emphasizes that the company plans to continue to make contributions to the open source community.
One area of innovation in the latest version—Graph Lab 2.2, which the startup plans to introduce to the open source community at a San Francisco workshop in July—integrates ideas developed by one of Guestrin’s students at Carnegie Mellon for running a graph dataset analysis on machines as small as a Mac Mini. The platform can now be scaled from there all the way up to the full horsepower of a cloud computing cluster, depending on the size of the dataset and how fast an answer is needed.
In addition to McIlwain, Greg Papadopoulos of NEA is joining GraphLab’s board. The company also has a technical advisory board including UW computer science chair Hank Levy; Sujal Patel, founder of Isilon Systems; and Chris Stolte, co-founder of Tableau Software.