As so many organizations transition to cloud-based systems, it’s increasingly common for organizations to experience outages. Typically, cloud services rely on centralized management and data centers. When these centers encounter issues, it can affect all users across the infrastructure. Unfortunately, this is a common cause of cloud outages.
For instance, if there's a power failure in these data centers, it could potentially impact millions of people, but it’s not just power outages causing cloud services to go offline. It can also be the result of natural disasters, political issues, wars, and terrorism.
In April 2023, Google's Cloud service experienced a power outage caused by a fire, which was exacerbated by water damage. This disruption affected several regions globally, such as Western Europe, Japan, India, Indonesia, and South Carolina in the United States. That was the second significant incident in 2023, with Microsoft Azure experiencing an outage in January, which prevented millions of users from accessing Outlook and Teams. As outages become more common, it's crucial to recognize that your vendor may encounter one at some point.
These outages are more than just inconvenient. They also pose significant risks to your organization, including:
- Reputation – Your reputation is at stake if you remain down for an extended period of time as it can lead to growing frustration among your customers.
- Finance – Outages cost money. The downtime of your business not only affects your employees' productivity, but also leads to a decline in revenue the longer it stays offline.
- Compliance – Due to their regulations regarding business continuity and disaster recovery, it’s possible for the U.S. Securities and Exchange Commission (SEC) to impose a penalty if a system experiences prolonged downtime. And it’s worth noting that the Financial Industry Regulatory Authority (FINRA) has also levied fines in the past due to system-wide outages.
Evaluate a Vendor’s Business Continuity and Disaster Recovery Plans
To help prevent outages, it's a good practice to review a vendor's business continuity (BC) and disaster recovery (DR) plans and ensure your vendors have conducted testing that validates the plan. It’s not good enough to review the plan just once. Business continuity plans and testing results should be reviewed and analyzed at least once a year for high-risk and critical vendors. This approach helps to identify any areas for improvement and minimize the impact of any incidents.
Of course, your organization needs its own plan. Still, it’s important to ensure that your vendors' BC/DR plans are at the same level or better than your organization’s business continuity plan, especially if you work closely with multiple vendors.
What to Ask Your Vendor After an Outage
Even if the vendor has solid BC/DR plans, they can still experience an outage, potentially affecting your organization or its customers.
If this occurs, it’s important that you ask your vendor these questions:
- What was the problem that caused the outage?
- Is this the first time experiencing this problem?
- When do you expect service to resume? Validate that against your stated recovery time objectives (RTOs) and service level agreements (SLAs) that are outlined in the contract.
- What happened to our data during the outage?
- Did your business continuity plan work as expected? If not, where did it fail?
- How do you plan to fix any issues that arose during the outage?
As businesses rely more on cloud services, it's crucial to have a strong digital infrastructure that can handle potential downtime. To achieve this, it's important to review your vendors' BC/DR plans and testing results and make sure they align with your organization's RTOs and contractual SLAs.
However, when the inevitable outage happens, it’s important to ask your vendors questions and hold them accountable for preventable failures. It’s equally vital to treat an outage as a learning opportunity to discover what can be done in the future to limit disruption to your business.