When Throughput Is the Wrong Goal: Dealing With Sequential Constraints in Distributed Systems

Distributed systems are often designed under the assumption that more concurrency leads to better throughput and lower latency. In many cases, that assumption holds.

This post is about a case where it didn’t.

The problem

We operate a distributed system composed of multiple microservices. One of these services exposes an operation that, for a given account, contains a step that must be executed sequentially.

If two requests for the same account enter this critical section concurrently, the second request will observe a stale resource and fail. There is no safe way to parallelize this part of the workflow without violating correctness.

Under light traffic, this constraint was mostly invisible. Under real-world usage, it became a significant problem.

When multiple concurrent requests arrived for the same account, roughly half of them failed due to this sequential dependency.

At that point, the question was no longer “how do we scale this?” but rather:

How do we make concurrency behave predictably when part of the system fundamentally cannot be concurrent?

Client A           Client B          Service X          Shared Resource
   |                  |                  |                     |
   |------------ R1 (Create) ----------->|                     |
   |                  |-- R2 (Create) -->|                     |
   |                  |                  |                     |
   |                  |                  |-- 1. R1 Read      ->|
   |                  |                  |<- 2. R1 Data (v1)---|
   |                  |                  |                     |
   |                  |                  |-- 3. R2 Read      ->|
   |                  |                  |<- 4. R2 Data (v1)---|
   |                  |                  |                     |
   |                  |      [Processing R1 locally...]        |
   |                  |                  |                     |
   |                  |                  | 5. R1 Commit (v1) ->|
   |                  |      [Data is now v2]                  |
   |<------------ 6. Success (200 OK)----|<--------------------|
   |                  |      [Processing R2 locally...]        |
   |                  |                  |                     |
   |                  |                  | 7. R2 Commit (v1) ->|
   |                  |                  |<-- 8. CONFLICT! ----|
   |                  |<-- 409 Error ----|    (data is stale)  |
   |                  |                  |                     |

⚠️  R2 read stale state (v1) while R1 was updating → conflict!

First attempt: retries

A common first reaction in situations like this is to add retries.

If a request fails because another one is already in flight, perhaps retrying after a short delay will allow it to succeed once the first request completes.

We introduced a retry mechanism around the sequential portion of the operation:

A small number of attempts
Short delays with backoff
Added jitter to avoid synchronized retries

This improved things noticeably. Failure rates dropped from around 50% to roughly 20%.

However, retries didn’t solve the underlying issue. They simply shifted contention in time. Under sustained concurrency, requests still collided, just slightly later.

Retries reduced symptoms, but they didn’t change the system’s shape.

Second attempt: explicit serialization with a distributed lock

The next step was to stop hoping that requests would eventually avoid each other and instead make the sequential constraint explicit.

We introduced a distributed lock using a shared coordination store. Before entering the critical section, a request would acquire the lock for that account. Only one request could proceed at a time; others would wait.

This had an immediate and dramatic effect:

Failure rates dropped to zero
The system behaved correctly under concurrent load
We were able to ship an initial version of the product with confidence

At this point, correctness was no longer the issue.

Latency was.

The hidden cost of correctness

Once the system was stable, a new failure mode emerged.

The critical section itself took on the order of a few hundred milliseconds to execute. With strict serialization in place, this meant that even modest request rates for the same account would create a queue.

As concurrency increased:

Requests waited longer to acquire the lock
Tail latency increased
Eventually, requests began timing out before completing

The system was now correct, but its throughput per account was fundamentally capped.

This exposed an important reality:

Locks don’t remove work. They only order it.

Once requests were serialized, the system could no longer hide the cost of the sequential step behind concurrency.

Considering batching and request coalescing

One idea we explored was request coalescing.

Client A        Service X              Shared Resource
   |                |                         |
   |---- R1 ------->|                         |
   |                |==== sequential work ===>|
   |                |                         |
Client B            |                         |
   |---- R2 ------->|                         |
Client C            |                         |
   |---- R3 ------->|                         |
                    |                         |
                    |<=== R1 done ============|
                    |                         |
                    |==== process R2 + R3 ===>|
                    |                         |
                    |<=== batch done =========|

Instead of processing requests one by one, the system could:

Execute the first request
Accumulate subsequent requests that arrive while it is in progress
Process those requests as a batch once the critical section completes

This approach can significantly improve throughput by amortizing the cost of the sequential operation across multiple requests.

From a purely technical perspective, this was a valid option.

However, it came with important trade-offs:

New batch-oriented APIs across multiple services
Changes to validation and error semantics
Additional coordination logic to ensure correctness
Increased implementation and testing scope

More importantly, batching does not remove the sequential constraint - it only pushes the throughput ceiling higher. Beyond a certain point, the system would still hit a hard limit, with diminishing returns for additional complexity.

Given product timelines and business priorities, this approach did not represent the best use of engineering effort at the time.

The decision was not about avoiding complexity, but about choosing where that complexity delivered the most value.

The “ideal” solution - and why it’s hard

Architecturally, the cleanest solution to this class of problems is to make the operation asynchronous.

Instead of forcing callers into a synchronous request/response model, the system could:

Accept requests
Enqueue work per account
Process requests sequentially in the background
Expose completion via polling or callbacks

This design makes the sequential constraint explicit and removes the need for distributed locks entirely.

In practice, this approach introduces a different set of challenges.

In our case, the service is called by external clients that expect synchronous semantics. Moving to an asynchronous model would require:

New, versioned APIs
Client-side changes outside our direct control
New retry, timeout, and failure semantics
Additional operational components (queues, workers, dead-letter handling)

While technically appealing, this solution would shift complexity across organizational boundaries rather than eliminate it. Under existing constraints, it was not a viable near-term option.

Where this leaves us

There is no final, perfect solution to this problem - at least not one that fits all technical and organizational constraints.

What this experience reinforced is that throughput is not always the correct optimization target.

In systems with unavoidable sequential steps:

Adding concurrency often shifts failure modes instead of removing them
Retries reduce symptoms but do not address root causes
Locks improve correctness but expose throughput limits
More complex solutions may be technically sound but misaligned with business reality

The real work becomes making these constraints:

Explicit
Predictable
Observable
Understandable by both systems and people

Sometimes, the most responsible decision is not to force an elegant solution, but to acknowledge the shape of the problem and choose the least harmful trade-offs.

Ary Lima