Never Put Large Payloads Inside Background Jobs
Last week, a customer nearly left Telebugs because of a mistake I made at the design level.
Their queue database grew to 95GB in a single day. Their app was processing ~40,000 jobs daily, and I had designed the main ingestion job to carry large payloads directly in the job arguments. On SQLite, that combination became toxic — even with cleanup running on schedule.
Here's what went wrong, why SQLite made it worse, and the fix.
The Setup
Telebugs processes error reports by enqueuing a job for each incoming event (see how Telebugs integrates with Ruby on Rails apps). The job receives the full envelope payload — the raw event data sent by the SDK — and processes it asynchronously.
The problem: those payloads aren't small. Each one averaged around 2.5MB — full stack traces, local variables, breadcrumbs, the works.
At 40,000 jobs/day:
40,000 jobs × 2.5MB = ~100GB of payload data per day
And that's before accounting for SolidQueue's default 24-hour retention window for finished jobs. Jobs don't disappear the moment they finish — they're kept in the table for 24 hours by default. So at any given time, up to 24 hours worth of finished jobs are sitting in the database, payloads and all.
That's how you get a 95GB queue database.
Why Job Arguments Are the Wrong Place for Big Data
Background job systems — SolidQueue, Sidekiq, GoodJob, DelayedJob — all work the same way at a fundamental level: job arguments are serialized and stored in a backing store (a database table, Redis, etc.) for the job's entire lifetime.
That lifetime is longer than you think:
- Job is enqueued → arguments stored
- Job is picked up by a worker → arguments still stored
- Job finishes → arguments still stored (until retention cleanup runs)
Every byte you put in job arguments sits there through all three phases. If your queue backend is a database, those bytes live in a table row. Multiply by throughput, multiply by retention window, and you'll be surprised how fast it adds up.
Rule of thumb: job arguments should be identifiers, not data.
Why SQLite Made It Worse
Any queue system would have had the bloat problem above. But SQLite has a specific characteristic that turned a bad situation into a critical one: it doesn't automatically reclaim disk space after deletes.
When SolidQueue's cleanup job runs and deletes finished jobs, the rows are gone — but the SQLite file
doesn't shrink. The deleted rows leave "dead space" inside the file. To actually reclaim that space,
you need to run VACUUM.
VACUUM rewrites the entire database file. And here's the trap: SQLite's VACUUM
requires roughly 2× the database size in free disk space to complete.
So if your queue database is 95GB and you only have 20GB free, you literally cannot run VACUUM. You can't reclaim space because you don't have enough space. You're stuck.
Postgres handles this more gracefully — autovacuum runs in the background and reclaims space incrementally. The bloat problem still exists, but it's less likely to become a crisis. On SQLite, the consequences are immediate and unforgiving.
The Fix
The fix is simple once you see the problem.
Before:
class ProcessEnvelopeJob < ApplicationJob
def perform(envelope)
# envelope is a large hash, serialized into the jobs table
process(envelope)
end
end
# Enqueuing
ProcessEnvelopeJob.perform_later(huge_envelope_hash)
After:
class ProcessEnvelopeJob < ApplicationJob
def perform(envelope_id)
envelope = Envelope.find(envelope_id)
process(envelope.payload)
envelope.destroy # clean up after processing
end
end
# Enqueuing
envelope = Envelope.create!(payload: huge_envelope_hash)
ProcessEnvelopeJob.perform_later(envelope.id)
The job now stores an integer ID instead of 2.5MB. The heavy data lives in a dedicated table with its own lifecycle — you can delete it immediately after processing, independent of the queue's retention window.
This has a few additional benefits:
- The queue stays lean regardless of payload size
- You control the payload lifecycle — delete it right after the job succeeds, not 24 hours later
- Easier to debug — payloads are queryable, inspectable, not buried in a serialized job row
- Works with any queue backend — the principle is universal
The Broader Lesson
This isn't a SQLite-specific problem. It's a design problem that SQLite exposed more painfully than other backends would have.
If you're using SolidQueue (or any queue system), treat every byte in your job arguments as if it has a cost. Because it does — in storage, in retention, and in operational complexity when things go wrong.
Keep job arguments small. Pass IDs, not data.
It's not optional.
Telebugs is a self-hosted, privacy-first error tracker that helps teams avoid exactly these kinds of scaling surprises. Learn more at telebugs.com