Open Playback · Free & MCP-native

Production failure memory
for your AI coding agent

Real incidents from GitHub, Cloudflare, Linear and others, structured and exposed over MCP. Plug it into Claude or Cursor and ask how production actually breaks

all abuse event api gateway auth background degradation background jobs bgp misconfiguration cache cache stampede capacity shortfall

MCP

Connect via MCP

Drop this into Claude Desktop or Cursor. Then ask your agent about cache stampedes, BGP failures, or DNS outages.

"open-playback": {
  "type": "http",
  "url": "https://open-mcp.aftermath.sh/"
}

32 encores

Sorted by date

GitHub
Apr 28, 2026
SEV-1
Actions Ubuntu hosted runners delayed by performance regression in VM reimage process
A performance regression in the VM reimage process for Actions hosted runners slowed the rate at which Standard Ubuntu 22 and Ubuntu 24 runners returned to the available pool, lowering effective runner capacity. About 8 percent of jobs on those runners were delayed past 5 minutes or failed during the window. Engineers mitigated by rolling back to a known-good image version, after which capacity recovered.
4h 28mNot disclosed affectedActions Standard Ubuntu 22 and Ubuntu 24 hosted runner jobs
capacity shortfallci/cdcontainer orchestrationcustomer-facing
GitHub
Apr 27, 2026
SEV-1
Elasticsearch overload from suspected botnet traffic degraded search across GitHub
GitHub's Elasticsearch cluster became overloaded due to load that engineers later attributed to suspected botnet activity. Search-backed UI surfaces, including Issues, Pull Requests, Projects, Actions workflow runs, and Packages, returned timed-out or empty results. Engineers identified the source of the additional load and disabled it, allowing the cluster to recover. After the cluster stabilized, GitHub had to reindex Pull Request data, with reindexing continuing into the following days.
6h 15mNot disclosed affectedsearch-backed UI surfaces across GitHub: Issues, Pull Requests, Projects, Actions, Packages
abuse eventcapacity shortfallcustomer-facingddos or abuse traffic
GitHub
Apr 23, 2026
SEV-1
DNS resolution failures in VA3 datacenter degraded multiple GitHub services
DNS resolution failures originating in GitHub's VA3 datacenter caused elevated error rates and degraded performance across several GitHub services, with the impact concentrated on Actions, Copilot, and Webhooks. Roughly 5 to 7 percent of overall traffic was affected during the window. Engineers identified the source of the resolution failures and applied a mitigation, after which dependent services recovered.
1h 24mNot disclosed affectedapproximately 5-7 percent of overall traffic; Actions, Copilot, Webhooks affected
configuration fixcustomer-facingdegraded performancedns
GitHub
Apr 23, 2026
SEV-1
Billing service config change overwhelmed cache, degrading github.com, Codespaces, Packages, and Actions
A configuration change to an internal billing service caused a shared cache to be overwhelmed, leading to request timeouts and degraded experiences across github.com, Codespaces, Packages, Copilot, and Actions. Web requests returned 5xx errors, Codespaces create and resume requests failed at high rates, and a large fraction of Actions jobs were delayed or failed. The mitigation rolled back or corrected the billing configuration; Actions then drained its queued backlog.
48mNot disclosed affectedgithub.com web, Codespaces, Packages, Copilot, Actions
cachecache stampedecascading failureci/cd
GitHub
Apr 22, 2026
SEV-1
Copilot Chat and Cloud Agent unavailable after infrastructure config change broke database connectivity
An infrastructure configuration change broke database connectivity for Copilot Chat and Cloud Agent on github.com, leaving users unable to interact with either service. Copilot Memory in preview was also unavailable to agent sessions during the window. Engineers identified the change as the cause and restored connectivity, with github.com recovering first and remaining regional deployments restored incrementally.
4h 2mNot disclosed affectedCopilot Chat and Cloud Agent users globally; staged regional recovery
authconfiguration errorconfiguration fixcustomer-facing
GitHub
Apr 13, 2026
SEV-1
Pages returned 500 errors after octodns automation deleted a backend DNS record
An automated DNS management tool, octodns, deleted a DNS record for a Pages backend storage host after its upstream data source intermittently failed to return the record. The automation treated the missing record as stale and removed it, causing Pages requests routed to that host to return HTTP 500 errors. Engineers re-created the deleted record to mitigate. The incident exposed the fact that the Pages frontend did not fail over to healthy backend hosts when one became unresolvable.
1h 37mNot disclosed affectedGitHub Pages traffic globally
configuration errorcustomer-facingdata repairdns
GitHub
Mar 24, 2026
SEV-1
Teams Integration unable to deliver GitHub notifications during upstream provider outage
An outage at an upstream dependency caused HTTP 500 errors and connection resets on the path used to deliver GitHub event notifications to Microsoft Teams. The integration could not relay notifications during the impact window, with about 19 percent of integration installs affected. GitHub coordinated with the relevant service teams and the issue resolved when the upstream incident was mitigated.
3h 54mNot disclosed affectedMicrosoft Teams Integration installs (~19% failed deliveries)
customer-facingdelayed processingdependency outagenotification
Unknown organization
Mar 24, 2026
SEV-1
Permission Filter Bypass from Variable Shadowing Bug
A performance optimization deployed to production contained a variable shadowing bug that caused team-level permission filters to be silently skipped. For approximately one hour, workspace members — including guests — could access data belonging to private teams within their own workspace via notification emails, client data sync, mobile sessions, API calls, and background tasks. No data was exposed outside any workspace, and no credentials were compromised. The change was reverted within the hour, all affected client sessions were cleared, and a post-incident audit found no evidence of malicious exploitation.
1h 3mNot disclosed affected—
api gatewayauthconfiguration errorcredential rotation
Cloudflare
Feb 20, 2026
SEV-1
BYOIP prefixes withdrawn after Addressing API cleanup task misinterprets empty filter parameter
An automated cleanup sub-task in Cloudflare's Addressing API incorrectly queried the API with an empty pending_delete parameter, which the server interpreted as a request for all BYOIP prefixes. The task then began systematically deleting all matching prefixes and their service bindings, withdrawing about 1,100 BGP prefixes from the Internet. Engineers stopped the runaway sub-task within 50 minutes, but full restoration took over six hours because some customers had service bindings stripped from edge servers and required a global configuration rollout to repair.
5h 7mNot disclosed affectedBYOIP customers globally (about 25% of Cloudflare's BYOIP prefixes)
bgp misconfigurationconfiguration errordeployhuman error

Production failure memory for your AI coding agent

Actions Ubuntu hosted runners delayed by performance regression in VM reimage process

Elasticsearch overload from suspected botnet traffic degraded search across GitHub

DNS resolution failures in VA3 datacenter degraded multiple GitHub services

Billing service config change overwhelmed cache, degrading github.com, Codespaces, Packages, and Actions

Copilot Chat and Cloud Agent unavailable after infrastructure config change broke database connectivity

Pages returned 500 errors after octodns automation deleted a backend DNS record

Teams Integration unable to deliver GitHub notifications during upstream provider outage

Permission Filter Bypass from Variable Shadowing Bug

BYOIP prefixes withdrawn after Addressing API cleanup task misinterprets empty filter parameter

Production failure memory
for your AI coding agent