Today (19th July 2024), outages have been reported across almost every facet of society, from airlines and airports, supermarkets and banking to communication services, NHS and trains. EDR org Crowdstrike said the problem was caused by “a defect found in a single content update for Windows hosts”. Whilst the company have confirmed that it was not a security incident or a cyberattack, there are still many questions about the incident.
Pros from across the cybersecurity space have weighed in.
Mayur Upadhyaya, CEO at APIContext, says:
“Our analysis from hundreds of thousands of API synthetic tests during the Microsoft outage reveals key insights: while core Azure services and Microsoft/CrowdStrike endpoints remained stable, the impact may have been felt primarily on end-devices. This highlights the critical role of robust API architectures and continuous monitoring for maintaining operational continuity, even when underlying systems experience disruptions.”
Simon Newman, Co-Founder of Cyber London and International Cyber Expo Advisory Council Member, says:
“Today’s worldwide outage shows just how complex and interconnected our supply chains are. This incident demonstrates the need for every organisation to have a robust Incident Response Plan in place that is regularly reviewed and tested to minimise the impact and recover quickly.”
“It’s important that lessons are learned from today’s incident to reduce the reduce the likelihood of it happening again and I would encourage all organisations to review their supply chain resilience regularly.”
Martin Jartelius, CSO at Outpost24, adds:
“This incident serves as a crucial reminder of the importance of the robustness of our security and availability, which hinge on the reliability of our service providers. Such vulnerabilities could arise from any critical component integral to the core system, as has been demonstrated in the past by diverse solutions including operating system vendors, antivirus and many more. The issues are rare, but they do show once again the importance of a very solid quality assurance process prior to releases.
This is similar to a supply chain attack – if an attacker had backdoored such an update to open systems to attacks or to encrypt them, the exact same systems would have been impacted. This is why supply chain attacks and defence has been increasingly important.
For those impacted, if their systems did not get the erroneous update then that is a positive. If they did get the update, some seem to be able to get up and running and they will fix themselves. Others will currently have a workaround to get the good update and have some hands on support, including boot into safe mode and removing some files, expect this to be done swiftly for any systems that need a high availability, but expect the clean-up in IT-departments to potentially drag out over the summer vacation period. So where it really matters, this should not be to hard to fix, but it will cost time and effort.”
Carlos Aguilar Melchor, chief scientist, cybersecurity at SandboxAQ
“It is essential to have visibility on the practices of your software supply chain, which includes how it is updated. We all learned from the global Solar Winds catastrophe that we cannot blindly accept updates from software that impacts key systems. This is especially true for software that is commonly used in all big businesses, such as ERPs, CRMs, and above all cybersecurity software”
Graham Steel, head of cybersecurity product at SandboxAQ
“This major outage has been caused by a bug that wasn’t caught by CrowdStrike before rolling out an update to thousands of companies globally. This new outage should spur all companies to put in place systems that will analyse every update before it is allowed into their company. Recent consolidation in the cybersecurity market has increased the risk of this recurring – businesses rely on just a few vendors.”
Marc Manzano, general manager, cybersecurity at SandboxAQ
“There has been an increasing trend to use AI to help developers write software code. This can indeed boost developer productivity, but where we need more help from AI is in improving quality assurance of code. This major global outage that brought thousands of flights and businesses to a standstill reminds us that humans are not very good at catching errors in thousands of lines of code – this is where AI can help a lot. In particular, we need AI trained to look for the interdependence of new software updates with the existing stack of software.”
Attributed to Chris Dimitriadis, Chief Global Strategy Officer, ISACA:
“This is nothing short of a crisis. When one service provider in the digital supply chain is affected, the whole chain can break, causing large-scale outages. This incident is a clear example of what could be termed a digital pandemic – a single point of failure impacting millions of lives globally. Doctors can’t see their patients, media outlets can’t broadcast news, and travellers are stranded at airports. This isn’t just business operations which are affected – these are real lives.
“The outage is a result of an increasingly complex and interconnected digital world, and this failing is exactly why cyber resilience is key for ensuring the safety, security, and wellbeing of citizens as well as a key enabler of the global economy. Although we are still waiting for more details on the incident, what we do know is the cost of it will be felt for months.
“Sometimes such incidents are caused by unintentional mistakes when updating software. Sometimes it is the result of a cyberattack. But the irony is that cybersecurity companies are also part of the supply chain, and those same companies that are fighting to establish cyber resilience may too become victims themselves, affecting service continuity.
“This incident underscores the urgent need for robust cyber resilience and preparedness to prevent similar crises in the future. In cybersecurity, detection, and response in the event of a crisis are just as important as protection and prevention. The right protocols must be established well ahead of time to move quickly when attacks and outages happen to minimise the damage and disruption. But this isn’t possible without the people with the skills to establish bespoke security frameworks and ensure everyone involved is trained on how to follow them. If we fail to prepare, this will only happen again.”
Elliott Wilkes, CTO of Advanced Cyber Defence Systems (ACDS), said:
“This disruption in service on Windows devices, which is affecting customers in industries across the globe from airlines in India and Australia to Sky News in the UK, appears to be caused by an error introduced in a update file pushed yesterday by cyber security company Crowdstrike in their Falcon product. This tool has software that runs on end user devices—called an “agent”—and runs in a similar fashion to classic antivirus software running on a desktop computer. Because agent-based detection systems often require enhanced or even administrator-level privileges to conduct monitoring of computer activity in order to detect malicious code, these systems can introduce risks. For tools like Falcon in the category of EDR—endpoint detection response—these typically have the ability to take action to immediately resolve or suspend services and if malicious activity is detected. This is a hugely important feature and needed component in order to rapidly defend against attacks, prevent infiltrators from moving laterally across an organization. However, these enhanced permissions come with risks, because they are integrated into critical components of the operating system of the end user devices. What we are seeing here is end user devices getting stuck in a reboot loop, on a screen that’s known as the “blue screen of death”, the infamous Windows error screen. Ultimately the likelihood of these events is small, but the impact as we can see today is tremendous.
Given the specific nature of this failure, getting stuck without the ability to reboot, makes this situation particularly challenging to resolve.”
Brian Higgins, Security Specialist at Comparitech:
“Crowdstrike have blamed a sensor update for the global outage and claim to be fixing the problem themselves. Their current advice is to take no further action but to monitor updates until a resolution is found. Not massively helpful for all of the essential services affected but since there is nothing practical to be done by users at this stage there is little more to be said. I’m sure there will be plenty of post-mortem commentary about resilience models and redundancies etc. in the days to come but right now the best we can do is hope that everyone comes out of this as safely as possible.”
Chris Hauk, Consumer Privacy Advocate at Pixel Privacy:
“Well, this outage certainly is a strike against my favorite security slogan, “update early, update often,” isn’t it? This outage has reportedly caused delays in flights around the globe and has also caused issues in emergency response systems in at least one U.S. city, Phoenix, AZ. I certainly have faith that IT professionals will be able to fix the issue and get their systems back up and running. Unfortunately, the manual actions required to fix the issue are a bit time consuming, so it may be a while before things get back to what passes for normal these days.”
Neatsun Ziv, CEO at OX Security:
“Incidents like the one we are seeing cause global chaos today, where an error in an update provided by a provider causes widespread outages, are not uncommon. What is unique about this incident is the scale at which it has taken place, likely wiping Billions from the global economy due to global, widespread downtime. It’s worth bearing in mind that although the widespread chaos is not the result of a cybersecurity incident, this is the kind of disruption a cybersecurity incident has the ability to cause.
The lesson which can be taken from an event such as this is the importance of choosing a vendor who can protect your server as a distinct and valuable portion of the network, seperate from endpoints.
Endpoint devices may need resetting in this kind of scenario, but if the server also needs resetting it becomes a much more complex fix. Taking the example of an ATM connected to an effected server, this may require a manual reset by an engineer, which for the large financial organisations currently affected could mean hours or days of downtime for key services. Moving forward, a system of agentless updates as opposed to automatically updating agents on the endpoint servers could help alleviate issues like this; The associated convenience of automating these updates creates more potential for outages and security incidents, and this kind of event could happen to any vendor that uses agent technology.”
Adam Pilton, Cybersecurity Consultant at CyberSmart:
“At the time of writing IT systems around the world are not operating. This is impacting many businesses and will impact our daily lives.
Currently, we do not know what has happened, there is no suggestion that this is a cyber attack. The belief is that this is a technical issue. Maybe not coincidently, the cyber security company Crowdstrike are having issues too. Time will tell whether these are directly related.
Crowdstrike has stated that they are aware of reports of crashes on Microsoft’s Windows operating system relating to its Falcon sensor.
There are some suggestions that this is two major incidents running simultaneously. A service-wide Azure outage and CrowdStrike Falcon blue screens.
What we are seeing now though are the businesses which have business continuity and incident response plans in place. These businesses are effectively communicating the issues and ensuring their customers are informed.”
Society is dependent upon technology and this is why we must have both technical and non-technical controls in place to protect us when issues arise, whether malicious or not.
Social media is ablaze with users reporting that they are unable to work and one user on Reddit even stated they were commenting purely to be part of history on ‘The day that Crowdstrike took out the internet!’
This is very much the point of why all businesses must plan and prepare. As we are seeing, a huge dependency on individual suppliers can take down supply chains.”