On February 20, 2026, Cloudflare experienced a six‑hour global service outage that disrupted customers using its Bring Your Own IP (BYOIP) services. The incident, which began at 17:48 UTC, rendered numerous applications unreachable and triggered HTTP 403 errors on Cloudflare’s 1.1.1.1 DNS resolver.
What Happened
- Root cause: An internal bug in Cloudflare’s Addressing API during an automated cleanup task.
- Coding oversight: The system passed a
pending_deleteflag with no value, causing the API to interpret it as a command to delete all BYOIP prefixes instead of just pending ones. - Impact: Roughly 1,100 prefixes were withdrawn, affecting 25% of all BYOIP prefixes globally.
- Blast radius:
- CDN & Security Services → Traffic failed to route, causing timeouts.
- Spectrum → Applications failed to proxy traffic.
- Dedicated Egress → Outbound traffic collapsed.
- Magic Transit → Protected applications became unreachable.
Recovery Timeline
| Time (UTC) | Event |
|---|---|
| 17:56 | Broken sub‑process executes, withdrawing prefixes. |
| 18:46 | Engineer identifies flawed task, disables execution. |
| 19:19 | Dashboard self‑remediation available for some customers. |
| 23:03 | Global configuration deployment completes, restoring all prefixes. |
Recovery was delayed because ~300 prefixes lost their service bindings entirely, requiring manual restoration across every edge machine.
Why It Matters
- Resilience gap: A single misinterpreted flag cascaded into a global outage.
- Customer impact: Critical services across industries were unreachable for hours.
- Trust challenge: Outages undermine Cloudflare’s promise of high availability.
Planned Remediation
Cloudflare announced several safeguards under its Code Orange resilience initiative:
- Standardized API schema → Prevent flag misinterpretation.
- Circuit breakers → Detect abnormal BGP prefix deletions.
- Operational snapshots → Separate customer configurations from production rollouts.
Final Thought
The Cloudflare outage is a stark reminder that internal automation errors can be just as disruptive as external attacks. For enterprises, the lesson is clear: resilience isn’t just about defending against adversaries—it’s about engineering for failure containment. Cloudflare’s transparency and remediation roadmap will be critical in rebuilding trust after this incident.
Leave a Reply