As a seasoned CIO/CISO and tech industry analyst with 35 years of experience, I’ve seen my fair share of cybersecurity incidents. However, the recent CrowdStrike outage stands out due to its extensive impact across multiple sectors. Here’s a deep dive into what happened, the repercussions, and the lessons we can all learn from this incident.
Background and Initial Reaction
I started my journey in IT in the late ’80s when I wrote a piece of software called PleadPerfect. Over the years, I’ve worn many hats—engineer, architect, and executive at both large and small companies. For the last 18 years, I’ve been a CIO/CISO for organizations ranging from 8-11 figures in revenue.
When I first heard about the CrowdStrike-related outage, my initial reaction was one of deep concern. I took a moment of silence in honor of the lost hours my peers and fellow IT pros sacrificed with their families to fix a problem that should never have occurred. The lack of good QA practices shown by CrowdStrike is deeply upsetting. They should have caught this issue in testing before releasing it to the public. The fact that it affected every Windows OS since 2008 is inexcusable.
Understanding the Incident
CrowdStrike’s Falcon software is installed at the core of the OS, which is how it protects machines so effectively. However, this tight integration also causes significant problems when updates are not properly tested. The faulty update led to widespread instances of the “Blue Screen of Death” (BSOD), causing machines to crash and not automatically recover. The recovery process involved booting machines in safe mode and deleting a CrowdStrike file—a task complicated by the inability to remotely enter safe mode on every device/OS. Additionally, best practices dictate securing the boot drive with BitLocker, which requires a key to unlock and enter safe mode. These keys are often stored in systems also affected by this flaw, greatly increasing the effort and time required for recovery.
Such incidents are not uncommon in the cybersecurity industry, but this one is particularly damaging because it stems from a QA and testing issue, not a cybersecurity breach. The tight integration between Falcon and the OS made the damage far more widespread and the recovery process far more onerous.
Impact on Businesses and Services
All sectors and industries were affected, but critical infrastructure sectors were hit the hardest. Transportation (airlines), banking/financial services, and healthcare (hospitals and emergency rooms) pose the most risk to world economies when disrupted. The three biggest US airlines, as well as those around the world, experienced grounded flights and communication issues. Banks in many countries went offline, and hospital networks faced significant disruptions.
Response and Resolution
CrowdStrike’s response to the incident was swift, but I am not sure what more they can do at this point. I did not feel George Kurtz’s (the CEO) apology was “full-throated” and took sufficient responsibility for the incident. This is nobody else’s fault but CrowdStrike’s. While they have committed to helping everyone affected, they have 24,000 customers, all of whom are impacted, so they cannot give each the attention they need. Billions of dollars in damage are being done to those companies from this outage.
Lessons Learned
The key lessons from this incident are clear: Be careful where you place your trust in other companies and partners. Ensure your contracts allow you to seek damages, as that may be the only recourse in such situations. Have a comprehensive disaster recovery (DR) plan and test it regularly. The number of companies having to rebuild their backup infrastructure just to restore systems because they cannot access (or do not have) their BitLocker keys is far too great.
To better prepare for and prevent similar issues, develop and thoroughly test your recovery plans. Consider using a completely different set of security tools for backup and recovery to avoid similar attack vectors. Treat backup and recovery infrastructure as a critical business function and harden it as much as possible.
Future of Cybersecurity
Time will tell how this incident influences future cybersecurity practices and policies. Between the SolarWinds and CrowdStrike issues, both being failures of best practices by the companies themselves, something has to change.
Emerging technologies like AI and machine learning could help predict and prevent similar issues by identifying potential vulnerabilities before they become problems. However, the real fix may lie in revamping processes and possibly having independent bodies audit and certify the practices of technology companies.
Personal Insights
As someone deeply involved in the tech industry, I stay updated with the latest cybersecurity trends and threats by reading extensively, following industry developments, consuming relevant content, talking to peers, and moving out of my silo to share and learn from others.
My advice to fellow CIOs and CISOs is simple: Plan for the worst and test for the worst. If you fail to prepare for these kinds of incidents, you will be in the worst possible position when the board asks for your response.
Final Thoughts
The recent CrowdStrike outage was a wake-up call for many in the tech industry. It highlighted the vulnerabilities inherent in our interconnected world and underscored the need for robust cybersecurity measures. By learning from this incident and implementing the lessons outlined above, we can better prepare for and prevent similar issues in the future.
Stay vigilant, stay prepared, and let’s continue to fortify our defenses against the ever-evolving landscape of cybersecurity threats.
The post Navigating the CrowdStrike Outage: Insights from a Tech Industry Veteran appeared first on Gigaom.
from Gigaom https://ift.tt/JtgTLml
Post A Comment:
0 comments: