Telebugs 1.18.0: Guardrails for Error Storms
Telebugs 1.18.0 is out. This release is mostly about something I care about a lot: making a small self-hosted system behave well when the application around it is having a bad day.
A few days ago I benchmarked Telebugs on a small Hetzner VPS. The results were good. A 2 vCPU / 4 GB RAM machine can process around 50 complete error reports per second end-to-end, which is plenty for many teams. I also shared the short version on X.
But benchmarks also make tradeoffs visible. Telebugs can accept bursts faster than it can process them forever. That is fine for a spike. It is not fine if one broken endpoint starts throwing errors nonstop and nobody notices until the disk is angry.
So this release adds guardrails.
Ingest Protection
Telebugs now has an Ingest Protection page under Instance settings. It lets admins configure three simple pressure controls from the UI:
- accepted errors per minute
- maximum queued errors
- minimum free disk space
When one of those limits is reached, Telebugs returns
429 Too Many Requests before it writes the ingest payload
or enqueues a job. That part matters. The point is not just to reject
work. The point is to reject work early enough that SQLite, Solid
Queue, and the disk do not become the next things you have to rescue.
The status card shows whether Telebugs is accepting errors or actively protecting the instance.
This is not a Sentry quota clone. Telebugs still has its own shape and its own priorities. I want the important operational settings to be visible to admins, understandable in plain English, and configurable without SSHing into a server to tweak a mystery variable.
Rate, Queue, and Disk
The rate limit handles the front door. If your apps suddenly send more errors per minute than the instance should accept, Telebugs pushes back.
Rate limit caps how many incoming errors Telebugs accepts per minute.
Queue protection handles the backlog. If Telebugs is accepting errors faster than it can process them, the pending ingest queue should not grow without bounds.
Queue protection pauses intake when queued errors grow faster than Telebugs can process them.
Disk protection handles the boring, scary failure mode. SQLite is wonderful, but running out of disk is still running out of disk. If free space drops below the configured threshold, Telebugs pauses intake before the database and queue keep growing.
Disk protection pauses intake when free disk space drops below the configured threshold.
I also added global banners when one of these protections is active. I do not want Telebugs to silently stop accepting errors. If intake is paused or limited, admins should see that quickly and have a direct path to the relevant settings and documentation.
A global banner makes it obvious when Telebugs is limiting incoming errors.
Maintenance Activity
This release also adds Maintenance Activity. It is a small admin-visible history for destructive maintenance work, especially the cleanup jobs behind data retention and ingest protection:
- scheduled error retention purges
- scheduled artifact purges
- manual project purges
- note attachment purges
- database
VACUUMruns
The goal is simple: when cleanup happens, an admin should be able to answer what ran, who or what started it, whether it finished, and what it deleted.
Maintenance Activity shows recent cleanup work without becoming a giant audit log.
I intentionally kept it narrow. It does not store deleted report titles, stack traces, payloads, exception messages, or every deleted row. It stores small summaries: counts, status, source, actor, project scope, timestamps, and safe details. Records are kept for 90 days and then pruned automatically.
There are linkable detail pages too, because sooner or later someone will ask, "what did that cleanup do?" and a URL is easier than a screenshot pasted into a chat.
The detail page is useful when you need to share exactly what a cleanup did.
Service Messages Moved Into Instance
I also moved Service Messages into the Instance section. That fits the product better. Service messages are not really a personal account feature. They are operational messages about the instance.
The old URLs redirect, the global attention badge still works, and the Instance sidebar now shows Service Messages alongside Ingest Protection, Maintenance Activity, Error Retention, and Artifact Retention.
A Repeatable Load Test
The feature work came out of the benchmark work. Telebugs now ships
with bin/load, the same harness I used for the small VPS
benchmarks.
It starts a throwaway performance instance, creates a real project token, runs k6 against the Sentry envelope endpoint, waits for the ingest queue to drain, and prints the numbers that actually matter: accepted errors, processed reports, drain time, queue peak, HTTP latency, and failed jobs.
It is not a customer-facing CLI command. It is a practical tool in the source tree for people who want to benchmark their own hardware or sanity-check a change before believing a number on a marketing page.
Wrapping Up
I like Telebugs this way. It is a small Rails app with SQLite and a database-backed queue. There are fewer moving parts to run, fewer things to explain, and fewer things to fix when production is already noisy.
That simplicity is a feature, but it needs guardrails. A small system should fail gently. It should tell you when it is protecting itself. It should give you enough visibility to understand what happened without turning into an observability project of its own.
That is what 1.18.0 is about.
Telebugs 1.18.0 is live now. Full details are in the changelog. If something feels off, write to [email protected]. I am also on X at @kyrylo and @TelebugsHQ.