In July 2024, a sensor configuration update released by Crowdstrike for Windows systems caused a logic error that led to a “blue screen of death” (BSOD), affecting numerous systems globally.
This global service disruption impacted various industries, resulting in approximately $5.4 billion in financial losses for Fortune 500 companies alone. The global losses are estimated to be around $15 billion.
The Ripple Effects of Crowdstrike Outage: Cost, Chaos, and Disruptions!
For businesses with a majority of their computing assets having Crowdstrike’s security software, the recovery process could not be performed remotely or at scale. Rather, human teams had to perform that process manually for each system.In this article, we will explore the key lessons learned and how organizations can enhance resilience and quicken recovery efforts.
From Crisis to Resilience: Key Lessons Learnt
How can software companies act now to prevent becoming the next one causing costly downtime?
Here are some proactive approaches to take to prevent major losses:
1. Test with Real-world Data
The quality of the dummy data depends on the person who created it! Nothing can replace real-world data. Putting your application through testing scenarios using real-world data will help you find bugs before they reach production. While this may not be feasible for every application, it’s worth considering.
2. Additional Validation Checks are Important for Rapid Response Content Update
Additional validation checks for rapid response content updates are crucial to ensure accuracy, security, and functionality. In high-pressure situations, errors can easily slip through. Organizations should always plan to:
- Initiate a phased deployment approach for Rapid Response Content to a larger portion of the sensor network.
- Enhance monitoring of sensor and system performance, gathering feedback throughout the deployment.
- Communicate content update specifics to customers through release notes.
3. Phased Rollouts for Mission-critical Applications are Important
Encouraging phased rollouts for mission-critical applications is crucial to minimize risk, especially in cases like this one. By gradually deploying updates or changes, organizations can detect and address issues early, preventing widespread disruptions.
This controlled approach enhances system stability and ensures operational continuity during critical updates.
4. Risk is Inherent in Any Configuration Modification
The Crowdstrike outage highlights that even seemingly low-risk configuration changes can cause significant disruptions. All changes, regardless of perceived risk, should follow the same rigorous software delivery process—testing, validation, and phased rollout.
Through these steps, the goal is to establish a sequence of measures to stop major bugs and threats from reaching users and causing the application to crash.
How Can Organizations Achieve Cyber Resilience?
Cyber resilience isn’t just about reducing financial losses but also about protecting the brand, trust, credibility, effective operations, and competitive advantages that a company has built in the marketplace.
Various global frameworks guide organizations in strengthening their security posture through resilience frameworks. Prominent frameworks include the NIST Cybersecurity Framework, ISO/IEC 27001, and the CIS Controls.
These cyber resilience frameworks offer organizations guidance, structured best practices, and recommended security controls to enhance their capability to resist, endure, and bounce back from cyber threats. These serve as guidelines, aiming to improve cyber resilience within organizations.
For instance, the most popular is the NIST National Institute of Standards and Technology’s Cybersecurity Framework (CSF). This framework is a set of guidelines designed to help organizations manage five key functions: identify, protect, detect, respond, and recover. This provides a flexible approach to improving security and resilience against cyber threats.
MITRE also provides the Cyber Resilience Engineering Framework (CREF) , provides resiliency goals and objectives and resilience techniques.
Similarly, the Cybersecurity Assessment Framework (CAF) created by the UK’s National Cyber Security Centre (NCSC) aims to enhance the security of network and information systems throughout the UK, particularly those critical to economic, societal, environmental, and individual.
But before implementing these frameworks, organizations need to set the foundation of cyber resilience through some benchmark practices.
Here we have discussed a few essential methods to strengthen your organization’s cyber resilience:
1. Cyber Drills:
These drills are structured simulations designed to mimic potential cyberattacks or system disruptions. Such exercises allow organizations to test their defenses, identify vulnerabilities, and evaluate the readiness of their incident response teams. By simulating various attack scenarios such as ransomware, data breaches, or system failures, cyber drills assess how well an organization can detect, contain, and recover from security threats.
This also ensures that senior management is well-prepared, aligning leadership with cybersecurity teams to enable clear decision-making under pressure.
Hence, through these drills, organizations build a culture of preparedness, reducing the overall impact of cyber incidents, much like practicing for a disaster to minimize potential damage.
2. Incident Response:
This refers to an organization’s procedures and tools to identify and address cyber threats and security breaches. The best course of action is to describe incident response protocols in as much detail as possible to identify, contain, and address different cyber threats.
3. Redundancy:
This refers to the duplication of critical components or systems to ensure alternative resources are always available. The purpose of implementing redundancy is to eliminate single points of failure in the IT environment, thereby enhancing system fault tolerance.
If a financial services firm has a critical server that handles all sensitive client transactions and suffers a crash, it will not only result in downtime but also lead to financial losses and damaged client trust. Here, having a redundancy like a backup system set up in a separate data center or an EC2 host in multiple availability zones in AWS acts as a faster disaster recovery method.
While this checklist covers key methods for enhancing cyber resilience, there are additional strategies for implementing cyber resilience.
Build Your Resilient Future with Our ADVISE Framework!
As we look ahead, the real measure of cyber resilience lies not in preventing every threat but in being prepared to tackle the threats. At Network Intelligence, we utilize our proven ADVISE framework to strengthen cyber resilience through comprehensive client engagements.
We conduct vulnerability assessments, align cybersecurity risks with business risks, implement enhanced security measures, and ensure long-term management, continuous improvement, and strategic evolution for sustained protection and growth.
Connect with our cybersecurity experts to schedule a consultation and discover how our innovative strategies can transform your cybersecurity posture.