Preparing for a Cloud Outage when running SaaS
Recent Salesforce.com issues reported by Chris Kanaracus in PC World caused me to think about business continuity processes in a cloud deployment scenario.
A single service outage does not change my opinion of the cloud. The cloud is good. It reduces the cost of delivering software and provides companies with the ability to rapidly scale their applications up and down. Before the cloud, we spent more time managing applications and less time thinking about how to improve our business. Before the cloud, IT spent time thinking about things like managing servers, service uptime, reliability, and backups. After the cloud, IT has time to focus on initiatives aimed at growing the business.
But, a single service outage does remind us that even in the cloud, we need to have a business continuity plan in place.
Business Continuity Planning in a SaaS World
Service level agreements (SLAs) and cloud reliability should not lull us into complacency with regard to business continuity procedures and processes. Moving to the cloud simplifies planning, but does not alleviate it.
Most businesses would not have been impacted by the recent Salesforce.com issues. The recent email slow down could be an inconvenience to marketing campaigns and notifications, but most core business operations would not be impacted. Most importantly, instead of a company-wide panic about what IT was doing with the servers, the crisis is limited to people who would place a call to Salesforce.com to open a trouble ticket. The SaaS provider is the one that has to take people offline to quickly solve the problem.
More Severe Software Outage
In the case of a slowdown, your business continuity plan might be as simple as opening a trouble ticket with your service provider. But what do you do if your service is completely down for a day or what happens if your provider is down for a week? Obviously, the answer will differ according to your business and the application. A local plumber or bookstore might be able to operate for a week without their computer systems, but businesses that are highly automated and manage time sensitive data will not have that luxury. A high volume business that promises next day order delivery might be able to work without their accounting system for a day, but their sales order processing system will need to be up 24×7.
Is ERP different from CRM?
Assume your CRM service is offline for a day. Productivity in your sales department will suffer, but many other parts of your business will continue to operate. Therefore, we recommend spending less time, effort, and expense for preparing for such a situation than we would for a mission critical application such as ERP.
Assume your all-in-one ERP service is offline for a day. Does this mean that you can no longer check inventory or process orders? Or does this simply mean that you have to run your business manually for a day?
What happens if the service is offline for a week? What happens if you lose your data?
Plan A: Wait it out.
The “be patient” plan only works if your business can run manually for a while and if you can manage entering data manually once your system comes back online. If wait and see is part of your “plan” then you had better be sure that your cloud provider has redundancy and disaster recovery plans. Make sure you consider what would happen if your provider loses any of your data.
Plan B: Onsite data backup.
If your business can operate manually, but you are worried about permanent data loss, you can get an onsite copy of your data (if your provider offers that). The odds of a SaaS provider permanently losing your data is almost negligible, but you may want to prepare for that situation. If your onsite copy is stored in a database friendly format, then you can get reports and other information from your local data while the provider is down. This could mitigate the impact of a short outage.
You could also backup data using a third party hosting provider. Just make sure that the hosting provider’s servers are not in the same location as your SaaS provider.
Plan C: Full Redundancy
As stated in a GigaOM article on preparing for cloud service failures, “Business systems need to be able to run on a number of different infrastructures — whether they be public clouds such as Amazon or Rackspace, or private clouds using traditional on-premise hardware — and be able to fail over between them quickly and efficiently as necessary.”
Cloud and SaaS providers provide a fully redundant architecture. The question remains, what happens if their full redundancy doesn’t hold up? Given recent outages, this question is necessary.
Keeping your ERP Application Running
Make sure you can run your application on premise or at a service provider that you choose. The last thing you want is to be locked-in to a service provider that cannot deliver reliable service.
The SLA is important, but ultimately you need to have the freedom of choice so you can select a datacenter that meets your price and uptime requirements.
What types of issues will occur
As part of deciding on a plan, think through the situations that may occur. What happens if a hard drive fails, or an entire server goes offline, or a network firewall fails, or an area-wide disaster? What happens is a rogue employee does something to the datacenter? In most cases (hard drive, server, network firewall), the SaaS provider will have processes in place to prevent this from impacting your service. Even in the case of an area wide disaster, the service provider should have a backup plan which will result in some offline time, but will quickly restore service.
Compare the cost of downtime with the costs of doing more advanced backups such as paying extra for real-time failover or having a backup copy of your data on-premises.
Ultimately your plan comes down to what type of risk are you willing to accept versus what you are willing to pay to alleviate that risk. With an unlimited budget, you can build a completely fault-tolerate system (provided that you can get your data and application code from your SaaS provider). SaaS providers reduce the cost of infrastructure through economies of scale and efficiency. But as recent incidents have shown, SaaS does not eliminate the need for disaster recovery – or at least the need to think about disaster recovery. When selecting a SaaS solution, compare what vendors offer in terms of service level agreements, access to data, and access to source code. Also remember that your need for disaster recovery may change over time.