Back to Open Playback
SEV-0public access

Data Loss from Database Migration TRUNCATE CASCADE

Linear · Source

Started
Jan 24, 2024
Duration
3h 47m
Users affected
Not disclosed
Revenue impact
Not disclosed
Blast radius
Services
database, sync-engine, api, notifications
configuration errorcustomer-facingdata corruptiondata lossdata repairdatabasedeployfull outagehuman errorrollback

Join the waitlist

Aftermath helps you ship structured post-mortems like this one for your own incidents. Encore keeps narrative, timeline, lessons, and action items in one place so the document stays useful after the incident is closed. Join the waitlist on the homepage when you want that workflow for your organization.

Join the waitlist

Summary

A faulty database migration using TRUNCATE TABLE ... CASCADE accidentally deleted production data across multiple core tables — including issue and document descriptions, comments, notifications, favorites, and reactions. The deletion went unnoticed for 30 minutes due to multi-layer caching. Linear was taken offline for one hour, the database was restored from a backup taken several hours before the incident, and a two-day data restoration effort recovered over 99% of lost data.

Impact

All users experienced approximately one hour of total platform downtime. 12% of workspaces had data unavailable until restoration completed; an additional 7% lost automated-process changes such as generated Cycles. Over 99% of data was recovered within 36 hours, with a small number of unresolvable conflicts leaving roughly 0.44 sync packets lost per affected workspace on average.

Root cause

A database migration intended to clear data from two new, non-user-facing tables used the SQL statement TRUNCATE TABLE <new_table> CASCADE. The CASCADE keyword propagates the truncation to all tables with foreign keys pointing at the target table, which unexpectedly deleted production data from several critical tables. The migration was code-reviewed and tested locally, but the dangerous behavior of CASCADE in this context was missed by both the author and peer reviewers. Caching at multiple layers (local client cache and a database-layer sync cache) masked the data deletion for approximately 30 minutes after the migration ran, delaying detection.

Resolution

Linear was placed into maintenance mode to prevent further writes. The database was restored from a full backup taken at 04:47 UTC, approximately two hours before the migration ran. Point-in-time recovery was available but had never been tested or tooled, so the full-backup path was used instead. A sync-engine cursor conflict caused by the database rollback was resolved by resetting the Redis cache and restarting the sync service. A custom restoration script then replayed captured sync actions to recover changes made between the backup timestamp and the maintenance window.

Lessons

  • Multi-layer caching can mask data destruction events for extended periods — error metric baselines that exclude 'expected' warnings can silently miss catastrophic failures.
  • Point-in-time recovery capability is only useful if it has been tested and tooled; untested recovery paths are effectively unavailable during a high-pressure incident.
  • Dangerous SQL operations like CASCADE truncation require a category of review distinct from standard code review — database admin review with schema-aware linting is necessary.

Action items

  • Remove TRUNCATE privileges from all production database users.
  • Implement linting rules that flag CASCADE on TRUNCATE in migration diffs and block merge without explicit DBA sign-off.
  • Build and routinely test tooling for point-in-time database recovery.
  • Automate database migration testing in a staging environment populated with production-representative data.
  • Implement read-only mode so clients can continue reading data during recovery operations.
  • Add data integrity monitors that detect unexpected row-count drops across core tables.