SEV-1public access

Elasticsearch overload from suspected botnet traffic degraded search across GitHub

GitHub · Source

Started: Apr 27, 2026
Duration: 6h 15m
Users affected: Not disclosed
Revenue impact: Not disclosed
Blast radius: search-backed UI surfaces across GitHub: Issues, Pull Requests, Projects, Actions, Packages
Services: elasticsearch, search, issues, pull-requests, projects, actions, packages

abuse eventcapacity shortfallcustomer-facingddos or abuse trafficpartial outagescale downsearch

Join the waitlist

Aftermath helps you ship structured post-mortems like this one for your own incidents. Encore keeps narrative, timeline, lessons, and action items in one place so the document stays useful after the incident is closed. Join the waitlist on the homepage when you want that workflow for your organization.

Join the waitlist

Summary

GitHub's Elasticsearch cluster became overloaded due to load that engineers later attributed to suspected botnet activity. Search-backed UI surfaces, including Issues, Pull Requests, Projects, Actions workflow runs, and Packages, returned timed-out or empty results. Engineers identified the source of the additional load and disabled it, allowing the cluster to recover. After the cluster stabilized, GitHub had to reindex Pull Request data, with reindexing continuing into the following days.

Impact

During the impact window, users experienced intermittent failures viewing Issues, Pull Requests, Projects, and Actions workflow runs. Search requests timed out or returned empty results. Pull Request listing pages did not show all indexed pull requests for an extended period after the immediate incident as Elasticsearch indexes were rebuilt. Packages and Actions also showed degraded performance during the window.

Root cause

An external traffic source, suspected to be a botnet, drove a large volume of search load against GitHub's Elasticsearch cluster.

The cluster's capacity headroom was insufficient to absorb the additional load while continuing to serve legitimate traffic.

Many user-facing pages (Issues lists, PR lists, Projects, Actions runs, Packages) read from Elasticsearch on the hot path, so the cluster degradation produced a wide blast radius.

The cluster's degraded state caused indexing and read paths to fall behind, requiring a multi-day reindex once the load was shed.

Rate limiting or shaping at the edge did not identify and isolate the abusive traffic before it overwhelmed the cluster.

Resolution

Engineers identified the source of the additional load and disabled it, after which Elasticsearch began recovering. Service degradation across Actions, Issues, Packages, and Pull Requests was mitigated by 22:35 UTC and the incident was closed at 22:46 UTC. Reindexing of Pull Request data continued for several days, with full backfill completing on May 1.

Timeline

16:31DETECT
Actions begins reporting degraded performance as Elasticsearch-backed reads slow down.
actions
16:33INVEST
Responders identify search failures across workflow runs, projects, and timed-out search requests as a common cluster issue.
elasticsearch
16:53INVEST
Pull Requests is added to the list of degraded services.
pull-requests
17:35INVEST
Users are reporting intermittent failures viewing Issues, Pull Requests, Projects, and Actions workflow runs.
issues
19:50MITIG
The team identifies the source of the additional Elasticsearch load and disables it.
elasticsearch
21:33MITIG
Degradation across Actions, Issues, Packages, and Pull Requests is mitigated.
actions
22:46RESOLV
The cluster is monitored to ensure stability and the incident is closed; Pull Request reindexing continues over the following days.
elasticsearch

Attribution

GitHub

By Infrastructure / SRE

Published Apr 27, 2026

View original source

Lessons

Search clusters that back hot-path UI surfaces are an attractive target for abuse and need capacity headroom plus shaping that keeps them serving during an event.
When the read path falls behind during an event, recovery time is often dominated by reindexing rather than by mitigation of the original cause.
Coupling many user-facing pages to a single search cluster means the cluster's worst day is the platform's worst day.

Action items

Improve identification and isolation of abusive traffic patterns at the edge so they do not reach Elasticsearch.
Reduce coupling between hot-path UI surfaces and the search cluster, or provide degraded fallbacks when search is unavailable.
Build faster paths for reindexing so recovery is not multi-day after a cluster recovery event.