Digital Reef’s Similarity-Based Search Helps Corporate Data “Speak For Itself”

Sometimes (just sometimes) it pays to look behind the jargon in press releases. One glance at yesterday’s coming-out-of-stealth-mode announcement from Boxborough, MA-based Digital Reef, which starts off talking about “massively scalable unstructured data management platforms” and “capabilities [that] improve eDiscovery outcomes,” was enough to make even a nerd like me want to tune out. But it turns out that Digital Reef has built something fairly new and interesting, a “similarity search engine” for big corporate networks that can start with one document—say, a Word or Excel file—and find others that resemble it.

That could be very useful if, for instance, you were a compliance officer at a big health plan and you wanted to see whether any of your employees had unsecured patient records sitting around on their laptop hard drives (which would be a big violation of federal healthcare privacy regulations). Just plop an example of a patient record into the Digital Reef system, and it will scour the network for other examples. Or say you were a lawyer at a big firm writing a brief in an employment case and you wanted to find out whether any of your colleagues working on similar cases in the past had already assembled the relevant citations. You could simply submit your entire draft to Digital Reef, and see what washed up.

The problem with most big organizations, says Brian Giuffrida, Digital Reef’s vice president of marketing and business development, is that they don’t know what they don’t know. A traditional keyword-based search might be effective if you already have a good mental picture of the kinds of information stored on your company network, and if you know what search terms are most likely to ferret out what you need. But if you don’t even know whether the information you’re searching for exists, where do you start?

The Digital Reef user interface“A lot of times, larger firms have dedicated knowledge experts to retrieve data…because searching is a cumbersome task that requires knowledge of the corpus and of the structure of the data,” says Giuffrida. “But once you start doing searches by similarity, you no longer need to be an expert in keyword searching or Boolean algebra. You can just say ‘Here’s the presentation we did at the conclusion of this engagement with Company ABC, I’m looking for similar stuff.’ It’s as simple as that.”

Digital Reef’s system, which has been nearly three years in the making, is designed specifically to index and organize the miscellaneous types of documents—Word, PowerPoint, Excel, PDF, e-mail—that clutter the average knowledge worker’s local file system. (Such data is “unstructured,” in computer science lingo, at least when you compare it to the nice rows and columns of information in transactional databases.) Not only are these the kinds of files where most of a company’s collective experience is stored, but they’re piling up faster than any other kind of corporate data, according to Digital Reef’s founder, president, and CEO, Steve Akers.

Steve AkersEnterprise content management systems like EMC’s Documentum offer one way to get a handle on all this information. But in a blog post yesterday, Akers argues that these systems force users to tag and classify documents in a way that’s contrived, error-prone, and poorly adapted to people’s actual work flows. The only way for a company to really find out what kinds of data it has laying around—and thereby what kinds of security, compliance, and litigation risks it might be opening itself up to—is to let that data “speak for itself,” in Akers’ words, through similarity-based search.

Akers is a longtime Boston-area technology entrepreneur who’s done stints at Apollo Computer, Hewlett-Packard, Stratus Computer, and Shiva Corporation. Spring Tide Networks, the Boxborough-based Internet data switch maker Akers co-founded in 1998, was purchased by Lucent in 2000 for a cool $1.5 billion. According to Guiffrida, Akers took several years off after leaving Lucent deal to develop the mathematical modeling techniques behind similarity-based search. He formed Digital Reef in late 2006 with a $10 million investment from Waltham, MA-based Matrix Partners. A second $10 million round, led this time by Boston’s Pilot House Ventures, closed late last year.

Until emerging from stealth mode this week, the company did business under the name Auraria Networks. It has 32 employees, including a handful in Atlanta, GA.

Wade Roush is a freelance science and technology journalist and the producer and host of the podcast Soonish. Follow @soonishpodcast

Trending on Xconomy

By posting a comment, you agree to our terms and conditions.

4 responses to “Digital Reef’s Similarity-Based Search Helps Corporate Data “Speak For Itself””

  1. vijaykumar says: