After Boom Years, Starburst and Podium May Signal Big Data’s Future
It seems like ancient history now, but “big data” was once a hot field with startups, investors, and big companies all buzzing with hype. Then the tech industry moved on, and marketers crowned data science and machine learning the Next Big Things (at least until blockchain takes over).
Of course, big data never really went away—though many companies did. In fact, there’s a lot happening in the field, and some of it is linked to the rise of machine learning and artificial intelligence applications. If you’re looking for a guide to what has happened—and what the future holds—you could do worse than talk to a guy named Justin Borgman (pictured).
Borgman helped build a startup out of Yale University called Hadapt during the big-data boom years of 2010-2012. Hadapt made software that combined advanced databases with the open-source analytics platform Hadoop. His company was acquired by database giant Teradata (NYSE: TDC) in 2014, and Borgman became a vice president and general manager there, leading efforts on the open-source software side.
Now, Borgman is on to something new. He’s leading a spinout of Teradata called Starburst, which is based in Boston. The new company, up and running since October, has about a dozen employees. Borgman says it is already profitable and expects to make “several million” dollars in revenue this year, selling to mid-sized and larger companies “across all sectors.”
Starburst is targeting a technical niche, but it’s a big one: providing services, support, and tools for a data analytics system called Presto. Presto was originally developed at Facebook and open-sourced to allow developers to use it broadly. In geek-speak, it’s a “distributed SQL query engine.” That means it allows users to run fast, efficient analytics on multiple types of data, wherever the data live (for example, in Hive, Cassandra, or relational databases). And that means a lot less hassle preparing the data for analysis, which traditionally has been done using the “extract, transform, load” process for managing databases.
Presto is used by big companies such as Amazon, Uber, Twitter, Netflix, and Airbnb (as well as Facebook) to do things like gain insights into user behavior, diagnose problems, and track sales and other results across huge amounts of data in many different formats. Borgman declines to name any of his customers yet, but says many Presto users are also customers of Starburst.
Daniel Abadi, a computer science professor at the University of Maryland, College Park, says in an e-mail that new methods of running queries on database systems “have become important differentiators across different vendors,” in part because of the increasing complexity of these systems. (Abadi was a scientific co-founder of Hadapt but isn’t involved with Teradata or Starburst.)
He adds, “Presto is well-positioned for placement at the forefront of this innovation, as leading tech companies and Presto users… feed back their advanced analysis practices back to the Presto development community and into the open-source project.”
After Teradata acquired his startup, Borgman says, he wanted to join forces with an existing open-source project in data infrastructure to make a big impact. So, his team approached Facebook about getting involved with and contributing to Presto, originally through Teradata.
“We wanted to become the company supporting Presto,” he says.
Now, he gets to do that as part of a smaller, independent entity. That means providing software tools and configurations to make Presto work smoothly for enterprises, as well as adding future capabilities. Starburst has a partnership with Teradata—the startup will support the big company’s customers who use Presto—but Teradata doesn’t have an ownership stake, Borgman says. And he says that (as of a few weeks ago at least) “the board is me.”
You might call it the “no VC, no board, no problem” startup model. But at some point, Starburst will probably want to scale up its operations. And it may need outside help, especially in the world of enterprise software. “I wouldn’t rule out raising capital,” Borgman says, but the company’s current setup “gives us tremendous flexibility.”
In fact, Borgman thinks too much VC money ultimately has hurt the field of big data. “Venture capitalists themselves sort of spoiled the market,” he says. “There were too many players, and business models were subsidized by VC.”
Consider the large amounts raised by the likes of Cloudera, Hortonworks, MongoDB, and Databricks. Smaller startups, like Hadapt, found it very tough to compete in the overcrowded sector, which led to a lot of consolidation and failed startups. “Some [companies] are still limping along, but they’re not on a growth trajectory,” Borgman says. “We were fortunate that we sold the company when we did.” (Meanwhile, Cloudera, Hortonworks, and MongoDB all became public companies, with market caps of over $1 billion each.)
A similar shakeout might happen in today’s market for machine learning and A.I. companies, though Borgman suspects venture funding levels haven’t yet reached that of big data’s heyday. Meanwhile, giants like Google, Amazon, and Microsoft are playing a big role in A.I. consolidation.
If there’s one theme that keeps coming up in business use cases for machine learning, it’s the need to improve the quality and accessibility of the data that machines are learning from.
“The big data piece is a prerequisite to A.I.,” Borgman says. “While A.I. is a hotter topic these days, the big data foundation is necessary to really have any success.” (Indeed, Abadi, the professor, says he’s not working on a startup at the moment, but if he were to do so, “it would likely be in the data infrastructure for machine learning space.”)
Other startups have complementary strategies for making data more accessible to businesses running analytics. For example, Tamr focuses on connecting and unifying customers’ data across departments and silos. Bedrock Data synchronizes data across different business systems for marketing and sales applications. And Podium Data manages and prepares data for enterprises, focusing on data quality and security.
These companies, along with Starburst, could represent the next generation of big data startups. They’ve all found some traction with big customers, and they’re starting to ride the wave of machine learning being applied in enterprises.
Podium, a 35-person startup based in Lowell, MA, counts TD Bank, Cigna, and Astellas Pharma among its customers. The company seems to have found a niche in helping enterprise users get started with analytics. “The challenge for large organizations is how to… start using big data to expose and leverage legacy data systems among business users, in a way that does not introduce additional risk or complexity into the IT landscape and yields business [return on investment] early, and incrementally, in the rollout process,” says Barbara Petrocelli, Podium’s vice president of marketing.
Petrocelli, a veteran of Oracle, Netezza, and IBM, doesn’t see Starburst as a direct competitor in the field. “We view Starburst as simplifying a consistent access to data—[whereas] Podium is about managing the interaction with data in a controlled, secure enterprise methodology that scales.”
She adds that there’s plenty of room for startups that are “giving companies innovative ways to reach information, in whatever format, wherever it lives.”