Permabit: Storing Enterprise Data Unerasably, At Bargain Prices
If you ate on your best china every night, flew first class even on puddle jumpers, and habitually drove your Mercedes rather than your minivan to the grocery store, it would be a lot like what most big companies do with their data, according to Tom Cook.
More and more of the information that e-commerce companies and other data-intensive businesses collect sits on expensive “primary storage” devices from companies like EMC, Hitachi, and Hewlett-Packard. Those machines make the data immediately accessible to the company’s Web-based applications or enterprise management software. But on average, only about 25 percent of the data in primary storage is actually needed for day-to-day transactions, says Cook, CEO of Cambridge, MA-based Permabit Technology. “If you moved the other 75 percent to a lower-cost tier, you’d get much better efficiency and better cost savings,” he says.
Permabit, you may not be surprised to hear, offers just such a technology: what it calls “enterprise archive storage.” Enterprise archiving isn’t the same as the daily data backups that most companies generate. Those systems, which are often tape-based, are still needed to guarantee that companies can recover from disasters. The difference is that most companies never plan on using the data that goes into their backup systems, whereas Permabit’s systems are built to store the final copies of frequently used files—just at lower cost than primary storage.
Most companies pay $30 to $50 per gigabyte for primary storage, according to Cook, while Permabit’s systems list for $3.50 per gigabyte. If customers use compression and de-duplication (the weeding out of redundant data) to squeeze even more information onto Permabit’s hard drive arrays, they can get that cost below $1 per gigabyte, he says.
There’s a technical secret to how Permabit can store all this data cheaply and reliably, in a way that frees customers from having to “migrate” from one generation of storage technology to the next every few years. And there’s a business secret to how the company—which was founded in 2000 but has only begun to see serious market demand for its technology in the last couple of years, according to Cook—has stayed alive so long without an “exit” event for its investors.
The technical secret first. If you’ve ever wandered into a data center, you’ve probably heard of RAID—an acronym for “redundant array of inexpensive [or independent] disks.” This became the dominant technology in the 1990s for splitting up data across lots of PC-class hard drives (as opposed to the huge, expensive drives on 1980s mainframes). RAID is great for storing terabytes of data cheaply, and it’s somewhat fault-tolerant: if one drive fails, it’s usually okay, because the data is copied and stored on at least one other drive.
But RAID has a weakness. If one drive fails and a new one is installed in its place, the data that was on the failed drive has to be replicated by locating it and reading it off remaining drives in the array. If an error occurs during that process—if, say, a storage block becomes corrupted and unreadable—there’s a small but real chance that the original data will be lost forever. And if a second drive fails before the reconstruction is complete—well, let’s just say you’re hosed. (In the case of a 16-drive RAID 6 array with two failed drives, Permabit calculates that there’s a whopping 50 percent chance that reconstruction will fail.)
To guard against that problem, Permabit’s founder and chief technology officer, Jered Floyd, led the development of an alternative storage approach called RAIN-EC. That stands for “redundant array of independent nodes—erasure coding.” The erasure coding is the key part; it describes how Permabit’s drives slice up data during the de-duplication process to make it “erasure resilient.”
The geeky details: For any given chunk of data, RAIN-EC first splits the chunk into four “shards.” It then uses a special algorithm to whip up two additional “protection” shards containing bits and pieces of the first four shards, in such a way that reading back any four of the six shards is enough to reconstruct the original chunk. Each of the six shards is then written to a different storage node in the array. (A node can consist of a single hard drive, or a cluster of them.)
In this way, very large files get spread across nearly the entire array. If any single node in the array fails, any data chunk with a shard on the failed node can be reconstructed from the other five shards, which are stored, by definition, on other nodes. If catastrophe strikes and two nodes holding sibling shards fail simultaneously, the chunk can still be reconstructed from the remaining four shards. On top of all that, shards in a RAIN-EC array are written and retrieved in parallel across many nodes, meaning that reconstruction after a drive failure happens much faster than in a RAID system.
Got all that? The basic point is that Permabit’s erasure coding algorithms “are both extremely fast and allow recovery from multiple failed drives or nodes,” according to Floyd. Customers like the RAIN-EC approach not only because it keeps data safer, but because Permabit’s grid-based architecture allows them to scale up their storage systems indefinitely simply by adding more nodes—meaning no more data migration headaches.
Cook says the company is “gaining traction” with RAIN-EC despite the “decided headwind in the marketplace” for enterprise hardware. As companies tire of paying so much for primary storage, Permabit will take over a big chunk of that market, Cook predicts: “We think we’ll inherit as much as 80 percent of enterprise storage by being economically competitive.”
But Permabit might not have had a shot at that market if it hadn’t gone through its own reconstruction a couple of years ago. By 2007, the company’s original venture investors, who included New York-based Baker Capital, “had been in it for seven years and had funds wanting to reach liquidity,” says Cook. “But the company wasn’t ready for that. The market was just starting to happen. We determined that the best thing to do was to buy out the current shareholders, capitalize the company effectively, and recruit a team of very seasoned people to do that.”
In essence, Permabit found a single angel investor—William Reeves, a former managing director at JPMorgan in London and the co-founder of BlueCrest Capital Management—to take the place of its venture backers. “He’s an angel, but he’s also a very sophisticated and experienced entrepreneur himself,” Cook says. “He understands the challenges we face, and he came in as an outside director with a mission to help us. That simplified our structure and let us focus completely on driving our business.” (Cook didn’t say how much Reeves put up to recapitalize Permabit, or what sort of return the original venture investors got on their money.)
So that’s the company’s business secret: find a patient, deep-pocketed backer who believes in your technology—but be ready to show that the technology is catching on.
“Being exposed to the capital markets right now is painful for many companies,” says Floyd. “Fortunately, regardless of who is funding us, we have a product that’s selling, and a sales force that’s out in the field and is successful. If you’re lacking any of those today and you’re looking for capital, it’s a big challenge…We feel very fortunate to have the company capitalized correctly at this time.”
Cook says Permabit has been growing at 100 percent per quarter recently, and that he’d be very happy if the company can continue “knocking down, in every quarter, the next 12 or 15 pieces of business,” including some large-enterprise accounts and cloud-computing vendors. Meanwhile, the company’s engineers are working to make its RAIN-EC technology even faster. Says Cook: “The speedier our product gets”—in other words, the more credibly it mimics primary storage—“the more it becomes central to people’s operations in an enterprise.”