Executive Summary: Over the last few months, I have been working on a Rust based authentication API that finally got a name: AuthOS. It has everything you’ll ever need from an authentication solution. AuthOS supports 3 database flavors: SQLite, PostgreSQL, and MySQL. The optimization journey, this case study, focuses on SQLite backend despite the tweaks improving speed across the other two backends. I successfully worked around the limitations of SQLite bringing the system to a state I believe can handle 40 Million daily requests. The benchmarks were not so surprising because the app is built with Rust, but some architectural decisions had limited the Ferrari engine to a tuktuk performance. I have been experimenting with Rust and even did a benchmark for device authorization on the article “Anatomy of a Bottleneck” but later decided why not go full blown authentication platform… I have always loved the GoTrue authentication system from Netlify for its speed, and yet, for an authentication solution, I have never liked the idea of limiting the user to a single database engine - Netlify’s GoTrue uses MariaDB, I wanted more!
1. The Starting Point: “The Death Spiral”
I started with a functional, compliant single-sign-on (SSO) API. It handled the complex user journeys I needed, such as Platform Owners approving organizations, admins managing webhooks, and end-users logging in via standard OAuth flows. But when I pushed it, the reality was harsh. Under synthetic load testing, my “Ferrari” collapsed at just 130 requests per second.
- Throughput: ~130 req/s (Saturated)
- Latency (P95): 17.69 seconds (Unusable)
- Diagnosis: The system was suffering from Resource Starvation.
- CPU Blocking: The heavy lifting of cryptography (specifically the Argon2 password hashing used during user registration and login) was running on the async event loop, choking the runtime.
- Write Contention: Every single API request was fighting for a lock. Whether it was the
auth_callbackhandler updating a “last seen” timestamp or the audit logger recording auser.loginevent, synchronous writes were locking the SQLite database, forcing every read operation to queue up indefinitely.
Fig 1: Evolution from a blocking, synchronous monolith to a non-blocking, event-driven architecture.
2. The Architectural Transformation
I wasn’t about to give up on SQLite. I knew the raw power was there, I just had to unlock it. I applied a series of optimizations, systematically identifying and decoupling the bottlenecks that were holding the engine back.
Phase 1: Unblocking the Runtime (CPU)
- The Problem: One user logging in was freezing the server for everyone else. When password verification hit the server, the Argon2 verification halted the entire event loop for 50ms.
- The Solution: I moved the heavy cryptographic work to
tokio::task::spawn_blockingand guarded it with aSemaphorelimited to 2x my CPU cores. - The Result: Authentication became an isolated, throttled process. A spike in logins could no longer bring down the system for users who were already authenticated.
// BEFORE: Blocking the async runtime
async fn verify_password(password: &str, hash: &str) -> bool {
// ✗ Runs on the main thread, freezing the server for 50ms
argon2::verify(password, hash)
}
// AFTER: Offloaded with concurrency limits
static CPU_LIMIT: Semaphore = Semaphore::new(8); // Max 8 concurrent hashes
async fn verify_password(password: &str, hash: &str) -> Result<bool> {
// ✓ Acquire permit (wait if too busy)
let _permit = CPU_LIMIT.acquire().await?;
// ✓ Run on a background thread, unblocking the API
tokio::task::spawn_blocking(move || {
argon2::verify(password, hash)
}).await?
}
Phase 2: The “Indestructible” Audit Actor (I/O)
- The Problem: My requirement for strict audit trails was killing performance. Every request triggered a disk write. In SQLite, that means exclusive access.
- The Solution: I implemented an Asynchronous Buffered Actor.
- The Trade-off: I traded strict immediate durability (logs sit in RAM for ~1s) for massive throughput.
- The Result: User requests return immediately. The actor handles the heavy lifting of batch writing and retries in the background, absorbing those pesky SQLite
BUSYerrors without the user ever noticing.
// BEFORE: Synchronous Writer
async fn log_event(event: Event) {
// ✗ API request hangs here until disk I/O completes
db.insert(event).await;
}
// AFTER: Asynchronous Actor
async fn log_event(event: Event) {
// ✓ Instant return! Pushes to an in-memory channel
audit_sender.send(event).await;
}
// Background Worker (The Actor)
async fn run_audit_actor(mut receiver: Receiver<Event>) {
let mut buffer = Vec::new();
while let Some(event) = receiver.recv().await {
buffer.push(event);
if buffer.len() >= 100 || timeout_elapsed() {
// ✓ Efficient batch write in background
db.insert_batch(buffer).await;
buffer.clear();
}
}
}
Fig 2: The Buffered Actor pattern decouples high-volume writes from the critical latency path.
Phase 3: Traffic Hygiene (Logic)
- The Problem: My code was too eager. Writing “Last Seen” timestamps on every request was flooding the Write-Ahead Log (WAL) for no good reason.
- The Solution: I implemented Debouncing. I taught the system to chill out, and it now only updates the database if more than 5 minutes have passed since the last write.
- The Result: Write IOPS dropped by ~90%.
Fig 3: Smart debouncing prevents the database from drowning in identical updates.
// BEFORE: Write on every request
async fn track_activity(user_id: Uuid) {
// ✗ Floods the WAL with identical updates
db.update_last_seen(user_id, now()).await;
}
// AFTER: Smart Debouncing
async fn track_activity(user_id: Uuid) {
let last_seen = cache.get(user_id);
// ✓ Only write if > 5 minutes have passed
if last_seen < now() - Duration::from_mins(5) {
db.update_last_seen(user_id, now()).await;
cache.set(user_id, now());
}
}
Phase 4: Fail Fast Concurrency (Stability)
- The Problem: When I really hammered the system (1000 concurrent users), the queue just kept growing until requests timed out after 30 seconds.
- The Solution: I implemented
TryAcquiresemantics on the Login Semaphore and set the SQLite Writer timeout to a strict 3 seconds. - The Result: The system now knows its limits. It sheds load (returning
429 Too Many Requests) instantly when saturated, ensuring that active users, those inside the club, still get sub-5ms responses.
Fig 4: The ‘Bouncer’ (Semaphore) keeps the club from getting overcrowded, so everyone inside can dance.
// BEFORE: Unbounded Queue
async fn login(creds: Credentials) -> Result<Token> {
// ✗ Queues forever if system is overloaded (30s timeouts)
let _permit = CPU_LIMIT.acquire().await?;
process_login(creds).await
}
// AFTER: Fail Fast Protection
async fn login(creds: Credentials) -> Result<Token> {
// ✓ Fails instantly if queue is full
let _permit = CPU_LIMIT.try_acquire().or_else(|_| {
Err(Error::TooManyRequests("System saturated, try again later"))
})?;
process_login(creds).await
}
3. The Final Scoreboard
The results were validation of the “SQLite First” philosophy.
| Metric | Initial State | Optimized State (Moderate Load) | Optimized State (Saturation) |
|---|---|---|---|
| Throughput | ~135 req/s (Unstable) | 210 req/s (Stable) | 467 req/s (Maxed) |
| P95 Latency | 17,690 ms | 4.88 ms | 2,720 ms |
| Error Rate | 0% (but unusable) | 0.00% | 0.01% |
| Bottleneck | CPU & DB Locks | None (Server Bored) | DB Connection Pool Limit |
Fig 3: Latency dropped by 4 orders of magnitude while throughput tripled.
4. Key Engineering Takeaways
- SQLite is not a toy: With proper tuning (WAL mode, dedicated Writer Actor,
IMMEDIATEtransactions), I proved that SQLite can sustain 467 requests/second mixed with heavy writes. This isn’t just a prototype; it’s an engine capable of serving ~25 Million MAUs on a single $5 VPS. - Async $\neq$ Non-blocking: Just because I marked a function
asyncdidn’t make it fast. Offloading the CPU-bound work was mandatory. - ACID is a Spectrum: I didn’t break ACID; I just chose where to apply it. Critical user data remains strictly consistent, while telemetry and logs became eventually consistent.
- The “5ms” Victory: Achieving a 4.88ms P95 latency on a full authentication flow, database, logic, and crypto included, is something I’m incredibly proud of. It’s the kind of performance usually reserved for expensive in-memory systems.
Conclusion
I have engineered a system capable of handling ~40 Million requests per day on a single node using a local file database. This architecture proves that complexity, shifting to Postgres, Redis, or Microservices, should only be introduced when you actually hit physical limits. For 99% of us, the Ferrari engine of SQLite is more than enough.
Appendix: The 25M MAU Estimation
To back up the claim of supporting 25 Million Monthly Active Users (MAUs) on a single $5 node, here is the math derived from the load test results:
The Math:
- Benchmark Throughput: The optimized system sustains 210 requests/second at moderate load.
- Daily Capacity: 210 RPS $\times$ 86,400 seconds = ~18.1 Million requests/day.
- Requests per User: I assume an average of 4 authentication requests per Daily Active User (DAU).
- DAU Capacity: 18.1M / 4 = 4.5 Million DAUs.
- DAU/MAU Ratio: Assuming a standard engagement ratio of 20%: $$ 4.5\text{M DAU} / 0.20 = \textbf{22.5 Million MAUs} $$
I rounded this to 25M given that the “saturation” limit was actually higher (467 RPS), suggesting even more headroom.
References:
- Estimating MAU from DAU: “The DAU/MAU ratio helps estimate the total active user base from daily traffic. For SaaS, a 13–20% ratio is a standard benchmark.” (Mixpanel: What is DAU/MAU? )




