Cloudflare, one of the largest content delivery networks, suffered a major outage today (21st June 2022).
According to the blog post published later today by Cloudflare, this was caused by routine maintenance in their 19 data centers. The maintenance task was done to improve the data center’s ability to handle downtimes which ironically resulted in their downtime.
This caused 500 internal gateway errors across many websites that use Cloudflare as their CDN. The outage lasted for 1 hour and 15 minutes. The affected 19 data centers were very critical and handled most of the requests.
Cloudflare mentions that this was not the consequence of an attack or malicious activity.
Most of the popular websites that rely on Cloudflare like Upstox , Zerodha ,Omegle , etc. started experiencing issues but recovered immediately when Cloudflare came back up.
Full incidence timeline by Cloudflare blog :
03:56 UTC: We deploy the change to our first location. None of our locations are impacted by the change, as these are using our older architecture.
06:17: The change is deployed to our busiest locations, but not the locations with the MCP architecture.
06:27: The rollout reached the MCP-enabled locations, and the change is deployed to our spines. This is when the incident started, as this swiftly took these 19 locations offline.
06:32: Internal Cloudflare incident declared.
06:51: First change made on a router to verify the root cause.
06:58: Root cause found and understood. Work begins to revert the problematic change.
07:42: The last of the reverts have been completed. This was delayed as network engineers walked over each other’s changes, reverting the previous reverts, causing the problem to re-appear sporadically.
09:00: Incident closed.