A cybersecurity company, how they brought down 8.5 million Windows machines in July 2024, and how to fix it.


"This is a very, very uncomfortable illustration of the fragility of the world's core internet infrastructure¹"
—Ciaran Martin


Disclaimer: I am not a security or tech support professional. This writeup is intended as a quick overview rather than a detailed and accurate professional analysis. In other words, to fix the issue, do your own, thorough research.

Crowdstrike is a cybersecurity company dedicated to providing advanced, even cutting-edge threat detection, intelligence and response solutions to organisations worldwide. They leverage both cloud computing and artificial intelligence to safeguard businesses from sophisticated cyber threats. Founded with the mission to make the digital world a safer place, CloudStrike has quickly become a trusted name in the cybersecurity industry (until this weekend!), offering a comprehensive suite of services designed to prevent, detect, and respond to cyberattacks in real-time. Their innovative approach ensures that clients can operate securely and confidently in an increasingly complex digital landscape.

Their clients include hundreds of Fortune 500 companies, government agencies, healthcare providers as well as infrastructure and transport entities. With a focus on endpoint protection, threat intelligence, and proactive security measures, they have become one of the most important players in cybersecurity protection and research. Nice company, nice product. It would be a shame if something were to happen to it…

2024 Windows "Crowd Stroke" Incident

on Friday, 19th July 2024, reports began to surface of Windows machines falling into a boot loop, and over time it became clear that a botched Falcon antivirus configuration update from Crowdstrike was at the centre of it. The cause was an update that had occured at 04:19 UTC, certain files ( C-00000291* and with timestamp 0409 Zulu) were causing Windows to fail with a Blue Screen of Death. Many machines, when rebooted, threw the error message again, as a driver in the Windows kernel attempted to use the updated files again, and simply restarted over and over with the same error. This is the "Boot loop" referred to.

The fix is actually childishly simple. Access the machine, boot into Safe Mode and delete the files causing the issue. However in these days when so many servers are remote, one simply does not have access to the machine. Even workstations in remote offices are problematic. Imagine talking a technically non-savvy user through the process!

"On the Windows sign-in screen, press and hold the Shift key while you select Power > Restart. After your PC restarts to the Choose an option screen, select Troubleshoot > Advanced options > Startup Settings > Restart. You may be asked to enter your BitLocker recovery key². After your PC restarts, you'll see a list of options. Select 4 or F4 to start your PC in safe mode. Or if you'll need to use the internet, select 5 or F5 for Safe Mode with Networking.") I do not envy support staff this job. It's tough enough getting someone to simply restart a computer without jumping through more hoops. Then, once the computer has booted, navigate to the Windows \System32\drivers\CrowdStrike\ directory and delete any files that look like C-00000291* with a timestamp of 0409 UTC.

Imagine going through that with a user who can just about open an Excel spreadsheet! And of course there are alternatives if this is all too difficult. I lent a bootable USB Linux image to someone so their techs could boot the machine, access the file system and delete the file manually. Bloke paid me "for my expertise", which I appreciate. I've heard of people creating a bootable Linux image with a script file that automatically does it all; simply plug the USB drive in, restart from that drive and voila!, all fixed. It turns out there's an even more elegant solution: use a barcode scanner to do the work³.

The fallout from this is going to be immense. With transport hit, flights around the world cancelled, some payment systems affected, surgeries cancelled and thousands of businesses with downtime it may be some time before people trust in the company again. Fingers are even being pointed (unfairly, possibly) at Microsoft for Not DOing Enough. Even I, a Windows-hater, cannot go that far. But some heads will roll for sure, and maybe people will be more cautious in trusting monolithic corporations in the future. We shall see. Confidence is shaken. Their share price certainly took a hit.

That said, I have this message from C-Dawg: "re Crowdstrike: Microsoft definitely is part of the problem here. CS on Windows is certified kernel mode software, and while it itself didn't crash, it is allowed to import executable code (which is what their rule updates are) into a kernel mode process. Obviously a vector for malware, but in this case it wasn't even that, it was just a corrupt update file." That their kernel permits a driver to run potentially dangerous files is possibly a poor design choice.

Lessons to learn here: test thoroughly before pushing to production! Also, never push to production on Friday. Meanwhile, any of you who are affected by this, especially if you're called upon to fix and recover, you have my sympathies. I will light a candle for you.

I am however, delighted to point you to the news that Crowdsrike won the Pwnie Award for Mot pic Fail at Def COn 2024, and the company president accepted it honourably and in the spirit it was given.


¹ On the subject of internet infrastructure fragility, of course there is a relevant xkcd at https://xkcd.com/2347/.
² Pray that your Bitlocker key is not on another machine that is compromised. Yes, this has happened.
³ Freudian tyop of the day: bracode




$ xclip -o | wc -w
730

Log in or register to write something here or to contact authors.