Cloudflare’s Six‑Hour Outage: Lessons in Resilience and Risk

On February 20, 2026, Cloudflare experienced a six‑hour global service outage that disrupted customers using its Bring Your Own IP (BYOIP) services. The incident, which began at 17:48 UTC, rendered numerous applications unreachable and triggered HTTP 403 errors on Cloudflare’s 1.1.1.1 DNS resolver.

What Happened

Root cause: An internal bug in Cloudflare’s Addressing API during an automated cleanup task.
Coding oversight: The system passed a pending_delete flag with no value, causing the API to interpret it as a command to delete all BYOIP prefixes instead of just pending ones.
Impact: Roughly 1,100 prefixes were withdrawn, affecting 25% of all BYOIP prefixes globally.
Blast radius:
- CDN & Security Services → Traffic failed to route, causing timeouts.
- Spectrum → Applications failed to proxy traffic.
- Dedicated Egress → Outbound traffic collapsed.
- Magic Transit → Protected applications became unreachable.

Recovery Timeline

Time (UTC)	Event
17:56	Broken sub‑process executes, withdrawing prefixes.
18:46	Engineer identifies flawed task, disables execution.
19:19	Dashboard self‑remediation available for some customers.
23:03	Global configuration deployment completes, restoring all prefixes.

Recovery was delayed because ~300 prefixes lost their service bindings entirely, requiring manual restoration across every edge machine.

Why It Matters

Resilience gap: A single misinterpreted flag cascaded into a global outage.
Customer impact: Critical services across industries were unreachable for hours.
Trust challenge: Outages undermine Cloudflare’s promise of high availability.

Planned Remediation

Cloudflare announced several safeguards under its Code Orange resilience initiative:

Standardized API schema → Prevent flag misinterpretation.
Circuit breakers → Detect abnormal BGP prefix deletions.
Operational snapshots → Separate customer configurations from production rollouts.

Final Thought

The Cloudflare outage is a stark reminder that internal automation errors can be just as disruptive as external attacks. For enterprises, the lesson is clear: resilience isn’t just about defending against adversaries—it’s about engineering for failure containment. Cloudflare’s transparency and remediation roadmap will be critical in rebuilding trust after this incident.