Third-Party Risk Lessons From the Global CrowdStrike Outage
By: Hilary Jewhurst on July 31 2024
6 min read
It's been quite a ride for organizations worldwide as they try to bounce back from the recent CrowdStrike outage. On July 19, what may have started as an ordinary day, swiftly spiraled into global mayhem. Thankfully (and surprisingly), this wasn't caused by a cyberattack, but rather by a routine internal update gone awry. So, what happened?
This blog will cover the impacts of the CrowdStrike outage and lessons learned for third-party risk managers.
An Overview of the CrowdStrike Outage and Its Effects on Third Parties
A minor tweak in CrowdStrike's sensor configuration update for Windows systems contained a logic error, resulting in the notorious Blue Screen of Death (BSOD) for computers worldwide. The resulting impacts were significant; major airline carriers faced flight cancellations and delays due to disrupted systems. Almost every major health system using Microsoft products experienced an outage. Hospitals faced technology issues, leading to paused procedures and patient disruptions. Banks and financial services providers also encountered operational disruptions, affecting transactions and customer services. Many other industries were also impacted, including services, wholesale, freight, and broadcast media.
The sheer scale of this outage is almost too hard to imagine. Thousands of organizations use both Microsoft and CrowdStrike. Beyond that, many organizations have customers and third parties that use Microsoft and CrowdStrike. It's still too soon to truly understand the impacts of this global event, including the financial losses and reputational damage resulting from it.
The big question is, what can we all learn from this? For those familiar with third-party risk management (TPRM), the situation perfectly illustrated the importance of business continuity planning and disaster recovery (BC/DR) management. But first, we need to talk about the dangers of concentration risk.
CrowdStrike’s Outage Is a Reminder of the Dangers of Third-Party Concentration Risk
Concentration risk occurs when an organization heavily relies on a single vendor or software provider, putting all its proverbial eggs in the same basket. While the CrowdStrike outage drove that point home for organizations, it also reminded us that industry consolidation could have far-reaching implications, especially when a few big players hold all the cards. The fact that a single security update caused such widespread disruption highlights the risks of global interconnectivity, particularly for organizations central to public safety, economic stability, and national security.
Here are 3 key takeaways on concentration risk:
- Increased regulatory scrutiny – Concentration risk is already a hot topic among regulators, and this situation will likely merit further scrutiny. Several key regulators, including the EBA, OCC, FDIC, and the Fed, have already issued guidance related to concentration risk. In the Basel Committee's newly proposed “Principles for the sound management of third-party risk,” the committee clearly states that banks are responsible for managing their own concentration risk. At the same time, supervisors should monitor systemic concentration risk. They also point out that banks should understand the overall criticality of a third party and gather information to assess the impact of entering into a third-party arrangement.
- Extended concentration risk – While it may not be obvious, concentration risk isn’t limited to an organization's third parties. For example, even if an organization wasn't directly affected because they weren't using CrowdStrike, or using it in conjunction with Microsoft, there’s a high probability that one or more of their vendors were impacted, or the outage had ripple effects along the supply chain.
It's essential to identify which third parties are critical to your operations and understand the role of fourth or nth parties (your vendors' vendors) in enabling your third parties to deliver products and services. - Proactive identification – Organizations should proactively identify concentration risks where critical dependencies exist, not just at the third-party level but also with fourth and nth parties. Gathering this data and effectively remediating significant systematic concentration risk can pose major challenges for a single organization. Still, organizations should resist taking the "it is what it is" attitude and not resign themselves to accepting the risk. Instead, they should focus on improving their BC/DR efforts, both internally and with their third parties.
Pro Tip: For vendors who provide products that automatically update or can have their configurations adjusted by the vendor, ensure you understand how all software changes are pushed out, including the types, cadence, and configuration options available. This is not only relevant for the recent event, but for the SolarWinds incident in 2020.
CrowdStrike Highlights the Importance of Your Organization’s and Vendors’ Business Continuity and Disaster Recovery Planning
BC/DR planning are essential components of risk management. They can serve as valuable tools for combatting concentration risk. Business continuity focuses on ensuring vital business functions can continue during and after a disaster. At the same time, disaster recovery is the process of restoring and recovering IT infrastructure and operations following a disaster or other disrupting scenario. Of course, BC/DR is necessary for an organization itself. Still, it also must be a requirement for critical third parties or those on which there is a significant operational, transactional, compliance, or financial dependency.
Most third-party risk practitioners are familiar with BC/DR plans. Let's clarify what these are and how they’re utilized:
- The business continuity plan (BCP) outlines steps for an organization to return to regular business operations after a disaster. It covers a broad range of threats, such as natural disasters and cyberattacks. It includes business processes, human resources, partners, and suppliers, ensuring overall business continuity by addressing how to maintain operations during and after a crisis.
- A disaster recovery plan (DRP) typically focuses on protecting IT systems and critical data during and after an interruption. This detailed and technical plan includes contingency plans for IT systems, ensuring data protection and system recovery from events such as massive outages or ransomware attacks.
Having well-developed plans is only part of the requirement. Those plans should be tested regularly and your vendors should share the results. Without testing, it's impossible to determine how effective the plan is.
TPRM, in collaboration with internal BC/DR teams, should consider if joint testing between the organization and third party is necessary. This may involve:
- Tabletop exercises – Discussion-based sessions to assess the plan's effectiveness without actual deployment
- Functional – A full test/failover, demonstrating what would happen once the plan is activated during an incident
- Full simulation – Comprehensive testing by simulating a realistic disaster scenario
Third-party BC/DR plans and testing results should be reviewed at least annually. They should always be reviewed by qualified subject matter experts (SMEs) to identify any gaps or weakness. Critical vendors should also have processes in place to collect and review the BC/DR plans of their critical vendors as part of an effective TPRM program.
Suppose a vendor has an ineffective or a material issue in their BC/DR plans or they aren’t effectively managing and monitoring the BC/DR risks in their own vendor inventory. In that case, your organization must implement additional (most likely internal) controls to bridge the gap. That might include diversifying the product or service across multiple vendors, increasing your organization's insurance, securing professional risk intelligence and monitoring, implementing a secondary vendor as a warm backup, finding another vendor, or combining solutions to minimize the potential impacts.
Remember that complex problems often require creative solutions, so loop in your internal BC/DR team and relevant SMEs, such as your operations, cybersecurity, legal, compliance, or even finance teams. A variety of expertise and perspectives can help your organization enhance its overall BC/DR approach, establish BC/DR standards for third parties and their subcontractors, and hopefully reduce the impacts of a business interrupting event.
At the end of the day, despite the chaos caused by the outage, there might be a silver lining. Most organizations will bounce back, albeit with some time, but should remember the lessons learned.
If nothing else, this global outage highlighted that concentration risk is no joke. It's a wake-up call for organizations to take proactive steps to identify concentration risk in their supply chain and ensure well-developed and tested BC/DR plans to limit the impacts of the next big disruption.
Related Posts
SEC 2024 Exam Priorities: Third-Party Concentration Risk & Operations
For anyone that stays up to date on the SEC’s annual priorities report, you may have noticed that...
Questions to Ask a Vendor Who Experienced an Outage
As so many organizations transition to cloud-based systems, it’s increasingly common for...
Third-Party Risk Management Best Practices for 2024
As we begin a new year, it’s an excellent opportunity to reflect on the past and make important...
Subscribe to Venminder
Get expert insights straight to your inbox.
Ready to Get Started?
Schedule a personalized solution demonstration to see if Venminder is a fit for you.