SEV-1public access

Actions Ubuntu hosted runners delayed by performance regression in VM reimage process

GitHub · Source

Started: Apr 28, 2026
Duration: 4h 28m
Users affected: Not disclosed
Revenue impact: Not disclosed
Blast radius: Actions Standard Ubuntu 22 and Ubuntu 24 hosted runner jobs
Services: actions, actions-hosted-runners, vm-reimage

capacity shortfallci/cdcontainer orchestrationcustomer-facingdelayed processingdeployregression from deployrollback

Join the waitlist

Aftermath helps you ship structured post-mortems like this one for your own incidents. Encore keeps narrative, timeline, lessons, and action items in one place so the document stays useful after the incident is closed. Join the waitlist on the homepage when you want that workflow for your organization.

Join the waitlist

Summary

A performance regression in the VM reimage process for Actions hosted runners slowed the rate at which Standard Ubuntu 22 and Ubuntu 24 runners returned to the available pool, lowering effective runner capacity. About 8 percent of jobs on those runners were delayed past 5 minutes or failed during the window. Engineers mitigated by rolling back to a known-good image version, after which capacity recovered.

Impact

Approximately 8 percent of hosted runner jobs using Standard Ubuntu 22 and Ubuntu 24 experienced delays greater than 5 minutes or failures during the impact window. Larger and self-hosted runners were not affected.

Root cause

A change to the VM reimage process for Actions hosted runners introduced a performance regression that lengthened the reimage step.

Slower reimage reduced the rate at which fresh runners returned to the available pool.

Lower effective capacity meant that jobs queued past their normal start time, with about 8 percent delayed or failing.

Telemetry on reimage performance was not granular enough to surface the slowdown immediately, contributing to time-to-detect.

There was no automated capacity-vs-queue-depth signal that would have triggered a mitigation before user-visible delay.

Resolution

Engineers mitigated by rolling back to a known-good image version, which restored normal reimage performance and let the runner pool refill. The incident was closed at 17:09 UTC after queue depth and start times returned to baseline.

Timeline

12:30MITIG
An updated VM image with a regression in the reimage step rolls out to Actions hosted runners.
vm-reimage
12:41DETECT
Ubuntu 22 and Ubuntu 24 hosted runner jobs begin experiencing run start delays as the runner pool fails to refill.
actions
13:59DETECT
GitHub publicly statuses Actions hosted runner capacity constraints on Ubuntu labels.
actions
14:49INVEST
About 5 percent of jobs on the affected labels are delayed or failing as the team investigates the root cause.
actions-hosted-runners
15:20MITIG
Engineers apply a mitigation that begins to unblock running Actions.
actions-hosted-runners
15:41MITIG
Less than 2 percent of Ubuntu hosted runs are delayed or failing as runner capacity recovers.
actions
16:36MITIG
Less than 1 percent of hosted ubuntu-latest runs are delayed; remaining mitigation steps continue.
actions
17:09RESOLV
Rollback to a known-good image version completes; queue depth and start times return to baseline and the incident is closed.
actions

Attribution

GitHub

By Infrastructure / SRE

Published Apr 28, 2026

View original source

Lessons

Capacity in pool-based systems like hosted runners is a function of recycle rate, not just count; a slow recycle is functionally a capacity loss.
Reimage performance is the kind of internal metric that doesn't show up on the user dashboard until users feel queue delay; explicit telemetry on it is worth the cost.
A rollback of an image is a clean mitigation when the regression is in the image-build step rather than in runner runtime.

Action items

Improve granularity of reimage telemetry across the Actions runner service and the underlying compute provider so similar regressions are diagnosed faster.
Address the underlying performance issue in the reimage process so the rolled-back image can be replaced with a forward-fixed version.