Tamr Emerges With $16M to Crack Data Curation for Enterprises

One of the Boston tech scene’s most dynamic duos is at it again. Yes, Andy Palmer and Michael Stonebraker are coming out of stealth with their latest company.

Cambridge, MA-based Tamr, formerly known as Data Tamer, makes software aimed at helping big companies manage and connect their many data sources; the idea is to give enterprises a faster way to access the right information to make business decisions.

Tamr also says it has raised $16 million-plus from big-name investors Google Ventures and New Enterprise Associates. Venture capitalists Rich Miner and Peter Barris have joined Tamr’s board of directors, which is chaired by database expert Jerry Held.

The company’s timing could be good, now that a lot of the marketing hype around “big data” has died down. And, more importantly, Tamr seems to be solving a real business problem with some market upside.

The company is led by chief executive Palmer, who was the co-founder and founding CEO of Vertica Systems (now part of Hewlett-Packard). In recent years, he has been an angel investor in companies such as Cloudant, CloudSwitch, and VoltDB. (He made Xconomy’s list of top angels in New England in 2012.)

He and co-founder Stonebraker, an adjunct professor at MIT’s Computer Science and Artificial Intelligence Lab, have previously collaborated on Vertica, VoltDB, and Paradigm4. Stonebraker has helped start other companies in the Boston area, such as Goby (bought by TeleNav) and StreamBase Systems (bought by Tibco).

If there’s a common theme among their startups, it’s big data applied to big-company problems. In Tamr’s case, some more background is in order.

Andy PalmerPalmer (pictured) is a rare breed of tech executive who also knows the healthcare world. He ran data engineering at Novartis and served as chief information officer at Infinity Pharmaceuticals, so he understands the “data curation” problem firsthand: think of thousands of bench scientists putting their experimental data into spreadsheets to be analyzed, and the company decision-makers having to sift through all the different data sources and formats. The upshot is that a lot of useful information in the “long tail” never gets looked at.

That’s because the traditional way of accessing databases—known as “extract, transform, and load,” or ETL—requires a programmer to handle each data source separately. The approach may work for a few dozen data sources, say, but it breaks down when you have thousands.

Mike StonebrakerStonebraker (also pictured), a longtime UC Berkeley professor who developed the Ingres and Postgres relational database systems, saw a way to solve this scalability problem from the bottom up. His experience with the Web startup Goby taught him that if you’re dealing with thousands of data sources—in Goby’s case, scraping 80,000 websites looking for events and attractions—you need to incorporate statistics, and use human experts only when they’re absolutely necessary.

Tamr’s technology is based on collaborative research done at MIT, UC Berkeley, Brown University, Brandeis University, and the Qatar Computing Research Institute. It uses machine-learning algorithms and statistics to integrate a huge number of data streams, with a touch of human-expert guidance to keep the algorithms on track; the owner of a particular piece of data may get an e-mail request for clarification, for example.

Overall, the data integration and curation entails understanding how all records are related to one another, paring down redundant data, flagging up items that have typos, and generally prepping all the information so it can be used downstream. If it works, the result should be a big time and cost reduction for customers—and a lot of added value in data that was previously hidden.

“I’m only interested in game-changing technologies,” Stonebraker says. “Life is short. This is a complete game-changer in the data curation market.”

That remains to be seen. But Tamr is getting some early traction and has done large-scale pilots with the likes of Novartis and Thomson Reuters. Palmer cites one customer who found that, of the $60 million worth of data it licensed, about one-third was redundant. In other words, the company could save $20 million if it had better visibility into its data. “Just seeing what data assets you have is a big deal,” Palmer says.

What’s more, he draws an analogy between what Tamr does for corporate databases and what search engines do for the Web. Connecting and ranking Web pages with algorithms, he says, is a bit like connecting and understanding data sources for big companies. “This is where we believe the enterprise has to go, like the modern consumer Internet,” he says.

Of course, once a company sees all its data, it still needs to make sense of it—that’s a huge, separate challenge in analytics. Tamr is positioning its software as sitting next to companies’ existing big-data platforms, toolkits, and data-visualization systems.

So, unlike Vertica or Endeca, Tamr’s technology does not include a storage platform. It’s also different from companies like Hadapt and DataGravity, which are more focused on analytics. (Though DataGravity does seem to have in common the idea of helping enterprises extract insights from data hidden in different silos.) So far, Tamr looks more similar to data intelligence companies like Trifacta, Paxata, and ClearStory, but it’s a little early to compare them definitively.

I asked Palmer for his thoughts on returning to a startup CEO role, versus angel investing and mentoring other entrepreneurs. “It’s really liberating for me to work on one project I really care about,” he says. “I’m an operating guy. I love to do real work.”

And the tech community is primed to see if that work—the latest in a series of Palmer-Stonebraker collaborations—will pay off in terms of a big local company, or an exit. The best indication that Tamr ultimately will be worth the duo’s time? “Our wives are good friends,” Palmer says with a laugh. “Their filter for us to start a new company is really, really high.”

Trending on Xconomy