Copilot Chat and Cloud Agent unavailable after infrastructure config change broke database connectivity
GitHub · Source
- Started
- Apr 22, 2026
- Duration
- 4h 2m
- Users affected
- Not disclosed
- Revenue impact
- Not disclosed
- Blast radius
- Copilot Chat and Cloud Agent users globally; staged regional recovery
- Services
- copilot-chat, copilot-cloud-agent, copilot-memory, copilot-database
Join the waitlist
Aftermath helps you ship structured post-mortems like this one for your own incidents. Encore keeps narrative, timeline, lessons, and action items in one place so the document stays useful after the incident is closed. Join the waitlist on the homepage when you want that workflow for your organization.
Summary
An infrastructure configuration change broke database connectivity for Copilot Chat and Cloud Agent on github.com, leaving users unable to interact with either service. Copilot Memory in preview was also unavailable to agent sessions during the window. Engineers identified the change as the cause and restored connectivity, with github.com recovering first and remaining regional deployments restored incrementally.
Impact
Users were unable to interact with Copilot Chat on github.com or with Copilot Cloud Agent for the duration of the window. Copilot Memory in preview was unavailable to Copilot agent sessions during this time. github.com recovered at 18:16 UTC; remaining regional deployments were restored progressively until full resolution at 19:18 UTC.
Root cause
An infrastructure configuration change altered the path Copilot services used to reach their backing databases.
The change resulted in connectivity failures between Copilot Chat, Cloud Agent, and the database.
The change was applied broadly across regions before the resulting connectivity failure was caught, producing a global rather than regional outage.
Recovery was inherently regional because the affected configuration was deployed and rolled back per region.
Resolution
Engineers identified the breaking infrastructure change and restored connectivity to the database. Copilot Chat and Cloud Agent for github.com were restored by 18:16 UTC, with remaining regional deployments brought back online incrementally until full resolution at 19:18 UTC.
Lessons
- Infrastructure changes that affect database connectivity should be staged regionally so a misconfiguration produces a regional rather than global outage.
- A regional rollout model is only useful if there is a forcing function that prevents the change from progressing globally before the first region is verified healthy.
- Preview features like Copilot Memory ride on the same connectivity assumptions as their parent services and inherit the same blast radius.
Action items
- Add controls that prevent similar infrastructure changes from causing database connectivity disruptions across regions simultaneously.
- Stage infrastructure config changes that touch database connectivity through a verified canary region before broader rollout.