Back to Open Playback
SEV-1public access

Copilot Chat and Cloud Agent unavailable after infrastructure config change broke database connectivity

GitHub · Source

Started
Apr 22, 2026
Duration
4h 2m
Users affected
Not disclosed
Revenue impact
Not disclosed
Blast radius
Copilot Chat and Cloud Agent users globally; staged regional recovery
Services
copilot-chat, copilot-cloud-agent, copilot-memory, copilot-database
authconfiguration errorconfiguration fixcustomer-facingdatabasefull outageinfrastructure changeregression from deploy

Join the waitlist

Aftermath helps you ship structured post-mortems like this one for your own incidents. Encore keeps narrative, timeline, lessons, and action items in one place so the document stays useful after the incident is closed. Join the waitlist on the homepage when you want that workflow for your organization.

Join the waitlist

Summary

An infrastructure configuration change broke database connectivity for Copilot Chat and Cloud Agent on github.com, leaving users unable to interact with either service. Copilot Memory in preview was also unavailable to agent sessions during the window. Engineers identified the change as the cause and restored connectivity, with github.com recovering first and remaining regional deployments restored incrementally.

Impact

Users were unable to interact with Copilot Chat on github.com or with Copilot Cloud Agent for the duration of the window. Copilot Memory in preview was unavailable to Copilot agent sessions during this time. github.com recovered at 18:16 UTC; remaining regional deployments were restored progressively until full resolution at 19:18 UTC.

Root cause

An infrastructure configuration change altered the path Copilot services used to reach their backing databases.

The change resulted in connectivity failures between Copilot Chat, Cloud Agent, and the database.

The change was applied broadly across regions before the resulting connectivity failure was caught, producing a global rather than regional outage.

Recovery was inherently regional because the affected configuration was deployed and rolled back per region.

Resolution

Engineers identified the breaking infrastructure change and restored connectivity to the database. Copilot Chat and Cloud Agent for github.com were restored by 18:16 UTC, with remaining regional deployments brought back online incrementally until full resolution at 19:18 UTC.

Lessons

  • Infrastructure changes that affect database connectivity should be staged regionally so a misconfiguration produces a regional rather than global outage.
  • A regional rollout model is only useful if there is a forcing function that prevents the change from progressing globally before the first region is verified healthy.
  • Preview features like Copilot Memory ride on the same connectivity assumptions as their parent services and inherit the same blast radius.

Action items

  • Add controls that prevent similar infrastructure changes from causing database connectivity disruptions across regions simultaneously.
  • Stage infrastructure config changes that touch database connectivity through a verified canary region before broader rollout.