Wednesday, July 24, 2024

CrowdStrike cites errors with its QA process

CrowdStrike published a preliminary Post Incident Review (PIR) regarding the incident that occurred on July 19, 2024, where a content configuration update for the Falcon sensor caused a Windows system crash (BSOD). The issue arose from a Rapid Response Content update intended to enhance telemetry on novel threat techniques. This update, however, led to unexpected system failures on Windows hosts running sensor version 7.11 and above during a brief period.

The problematic update was released at 04:09 UTC and impacted systems until 05:27 UTC when it was reverted. Only Windows hosts were affected; Mac and Linux systems remained unaffected. The issue stemmed from an undetected error in the Rapid Response Content update. The error caused an out-of-bounds memory read, leading to the crash. CrowdStrike’s extensive QA processes and staged sensor rollout procedures were unable to prevent this issue. Enhancements in testing and deployment strategies are being implemented to prevent future occurrences.

Key Points:

Incident Date and Time: July 19, 2024, from 04:09 to 05:27 UTC.

Affected Systems: Windows hosts running Falcon sensor version 7.11 and above.

Cause: Error in Rapid Response Content update.

Impact: Windows system crashes (BSOD).

Reversion: Update reverted at 05:27 UTC.

QA and Rollout: Extensive testing failed to catch the issue.

Prevention: Enhanced testing and deployment strategies.

Platform Stability: Improvements in error handling and validation.

Customer Control: Greater control over content update deployments.

Transparency: Detailed release notes for content updates.