Big Data at Facebook—A Glossary
Facebook, like many engineering-driven companies, is seldom satisfied with off-the-shelf solutions for its computing problems. Its software teams regularly come up with new algorithms or management systems meant to make the company’s infrastructure more reliable and scalable. Many of these projects are offshoots of open-source technologies like Hadoop, and Facebook ends up contributing many of its innovations back to the open-source community. Here’s a list of projects the company has described in public, alphabetized by code name.
Avatarnode—A fail-safe version of the Namenode metadata server in Hadoop that improves the reliability of Hadoop clusters.
Claspin—A monitoring and visualization tool that shows Facebook engineers which servers in a cluster are underperforming or failing.
Corona—A system that improves the way jobs are scheduled and managed in Hadoop; open-sourced in 2012.
Dragonstone—The code name for the first server design released by Facebook through the Open Compute Project.
Gatekeeper—A service that controls which users see which experimental features on Facebook, and prevents overlapping changes from appearing on the same page.
Graph Search—A new service that shows Facebook users search results filtered according to the preferences of people in their networks.
Hadoop—A system that makes it simple to distribute computing jobs across dozens to thousands of servers; originally developed by Yahoo, heavily adopted by Facebook, and now managed by the Apache Software Foundation.
Haystack—Facebook’s custom-built infrastructure for storing photos.
Hiphop—A system that reduces CPU usage on Web servers at Facebook by transforming Facebook’s PHP source code into C++ before it’s reduced to machine code. Open-sourced in 2010.
Hive—-A data warehouse system that makes it easier to query data in large Hadoop clusters. Open-sourced in 2008 and now managed by the Apache Software Foundation.
Peregrine—A system for querying data in Hadoop clusters in near-real-time, without having the query wait as part of a batch-job system.
Prism—A system that makes a database distributed across multiple data centers behave as if it’s contained within a single data center, by replicating and moving data as needed.
Scuba—A Web-based system that makes it easier for engineers to dissect statistics about the performance of Facebook’s infrastructure.
TAO—a distributed database that lets Facebook engineers treat users and the relationships between them as if they were nodes and edges in a true graph database.