How Will You Manage the Cloud?


Are you considering a move to the cloud? Are you planning to use Infrastructure-as-a-Service (IaaS) as the backend for your new website or application? Do you depend on cloud services being fast, cost-effective, secure, and available 24×7? If you answered “yes” to any of these questions, be forewarned:

While the features and scalability of offerings from public cloud vendors are truly game-changing, the tools for monitoring and managing the cloud have not kept pace. New cloud users—particularly those who are accustomed to the mature management solutions available in dedicated infrastructure environments—should plan accordingly.

Over the past several months, our team has interviewed dozens of people who build and run applications in the public cloud. We ask them about their experience trying to manage their cloud environments. Five trends have emerged thus far:

1. Engineering and operations teams need more tools to manage their environments than they anticipated. It is not uncommon to have 6 or more separate systems dedicated to monitoring and analysis for an environment—not to mention all of the tools that are used for other tasks (such as deploying and configuring the services).

2. Companies follow a predictable pattern in terms of the tools that they use to manage their environments. They start with nothing and gradually add disparate components (with minimal integration) as their operations scale and become more complex.

3. The monitoring portfolio for most companies incorporates packaged software, open source software, SaaS services, and custom/home-grown products.

4. Engineering and operations teams spend a considerable amount of time and effort integrating, customizing, and managing their monitoring systems.

5. Engineering and operations teams are generally lukewarm about the solutions that they have in place.

This graphic highlights the most popular monitoring tools among the companies with whom we have met:

This is neither an exhaustive list of tools nor an authoritative view on the landscape for any category (talk to Gartner for that). We have included any tools that two or more of our interviewees report using to monitor their environments. Our sample includes about 30 companies that use Infrastructure-as-a-Service for production applications. Most are SaaS companies with large Amazon Web Services deployments (typically hundreds or thousands of EC2 instances, $25k-$1M/month in AWS spend).

We define the categories as follows:

● Application Performance – Tools that provide visibility into the run-time application or business transactions (see here).

● Availability – Tools that provide “outside-in” visibility into uptime of application resources.

● Log Analysis – Tools that provide easy access and visibility into information contained in application log files.

● System Monitoring – Tools that provide “inside-out” visibility into the health, capacity, and utilization of instances.

● IaaS Monitoring – Tools that provide insights into users’ cloud infrastructure resources and configuration.

● Alerting – Tools that provide unified workflow for notifications that are generated in other management tools.

● Event Tracking – Tools that provide a unified repository and interface for reporting on application errors and events.

Satisfaction with the current state of affairs varies dramatically. By and large, those who manage mature cloud operations view their current web of open source and custom tools as satisfactory—but quickly acknowledge that the journey has been more difficult and costly than they ever anticipated. Newer cloud users are worse off; they feel the pain of not having the right management infrastructure in place (in the form of performance issues, unplanned downtime, and cost surprises) but lack the people and experience to build and integrate the portfolio of tools that are required today.

The more concerning implication of our research is that “cloud-powered” companies across the maturity spectrum are losing out on opportunities to innovate and serve their customers due to ineffective cloud management. They build and support IT applications to manage their infrastructure. They extend and integrate open-source monitoring systems. They build custom metric data warehouses and visualization tools. And still, they struggle to deliver performance and availability that are on par with legacy dedicated infrastructure.

One of the people we interviewed, Brendan Schwartz, CTO and co-founder of Wistia, said it best: “We moved to the cloud to spend more time on dev and less time on ops. Unfortunately, we are a long way from that reality.”

Dan Belcher is the co-founder of Stackdriver, a cloud infrastructure management startup in Boston. Follow @stackdriver

Trending on Xconomy