DNS resolution failures in VA3 datacenter degraded multiple GitHub services
GitHub · Source
- Started
- Apr 23, 2026
- Duration
- 1h 24m
- Users affected
- Not disclosed
- Revenue impact
- Not disclosed
- Blast radius
- approximately 5-7 percent of overall traffic; Actions, Copilot, Webhooks affected
- Services
- dns-va3, actions, copilot, webhooks
Join the waitlist
Aftermath helps you ship structured post-mortems like this one for your own incidents. Encore keeps narrative, timeline, lessons, and action items in one place so the document stays useful after the incident is closed. Join the waitlist on the homepage when you want that workflow for your organization.
Summary
DNS resolution failures originating in GitHub's VA3 datacenter caused elevated error rates and degraded performance across several GitHub services, with the impact concentrated on Actions, Copilot, and Webhooks. Roughly 5 to 7 percent of overall traffic was affected during the window. Engineers identified the source of the resolution failures and applied a mitigation, after which dependent services recovered.
Impact
Approximately 5 to 7 percent of overall GitHub traffic was affected during the impact window. The most visible impact was on Actions and Copilot, with Webhooks also degraded. The incident was a second user-visible event on April 23 following the earlier billing/cache event the same day.
Root cause
DNS infrastructure in GitHub's VA3 datacenter began returning resolution failures for internal hostnames.
Services running in or routing through VA3 lost the ability to reach dependencies they normally resolve via that path.
The DNS layer in VA3 was a single point of failure for the affected resolutions during the window; impacted services did not have a clean failover to DNS in another datacenter.
Detection initially showed the symptom (degraded Actions, Copilot, Webhooks) before the team could correlate the common DNS root cause across services.
Resolution
Engineers identified the DNS resolution failures originating in VA3 and applied a mitigation that restored resolution. Actions, Copilot, and Webhooks recovered as dependencies became reachable again, and the incident was closed at 17:27 UTC.
Lessons
- A single-datacenter DNS failure can produce a wide cross-service blast radius even when the affected fraction of overall traffic looks modest.
- Multi-symptom incidents often have a single underlying cause; correlating across services is the fast path to mitigation.
- The same calendar day saw two unrelated user-visible incidents driven by different shared infrastructure (cache, then DNS), which is a structural signal about coupling.
Action items
- Improve monitoring on the VA3 DNS path so resolution failures surface as a top-line signal rather than as a downstream service degradation.
- Evaluate failover behavior for services that depend on VA3-resolved hostnames so that a future single-datacenter DNS event has a clean recovery path.