04 / 06 Ordering and idempotency
  1. ← Talking with messages: reliability in distributed systems
  2. 00 Why talk with messages
  3. 01 The three families of messages
  4. 02 Broker versus log
  5. 03 Delivery semantics
  6. 04 Ordering and idempotency
  7. 05 Dual write and the transactional outbox
Talking with messages: reliability in distributed systems · 04 / 06

Ordering and idempotency

The previous chapter promised a consumer that recognizes an already-processed message, without saying how. Here it is. But while building it, we discover that redelivery does not only bring duplicates: it can also put everything out of order, and idempotence alone is not enough.

In chapter three, we saved the payment with a formula: at-least-once, plus idempotent processing. We even described the gesture, the consumer “recognizes” an already-seen message and does not redo the effect. But recognizes how? With what key, stored where, and above all: what stops a crash, landing right between the charge and the “already done” note, from making everything start over? And even once that is solved, a second threat is waiting. At-least-once delivery does not merely repeat messages, it can also replay them in the wrong order. If “order shipped” comes back before “order paid”, no deduplication in the world will save you. This chapter builds both ramparts: one against the duplicate, the other against disorder.

Two wounds, not one

Let us start again from chapter three’s observation. Choosing “at-least-once” means accepting that the system redelivers as long as it has not received an acknowledgement. This redelivery actually inflicts two distinct wounds on you, and they are often confused.

The first one we know: the duplicate. The same message arrives twice, and a naive handler applies the effect twice. Double charge.

The second is more insidious: disorder. When the system replays messages, or when several workers consume the same queue in parallel, nothing guarantees that “paid” is processed before “shipped”. A late message, a redelivery, a worker slower than the other, and the causal order breaks.

These two wounds call for two different remedies, and that is the whole subject of the chapter. Against the duplicate: idempotence, which we are finally going to build for real. Against disorder: partition ordering. Both rest on the choice of a key, but it is not the same key, and they do not answer the same question.

Building the idempotent consumer

We start by honoring chapter three’s promise. An idempotent consumer Idempotent consumer A consumer whose processing produces the same result whether a message is handled once or several times. It tracks every already-processed message by its deduplication key and, in the same transaction as the effect, marks that key as seen: on a redelivered duplicate it recognizes the key and skips the effect. The atomicity between applying the effect and recording the key is essential, otherwise a crash between the two reopens the two generals problem inside its own database. Source: Hohpe & Woolf, Enterprise Integration Patterns is a consumer whose processing gives the same result whether a message arrives once or ten times. To achieve this, it needs two things: a way to recognize an already-processed message, and a guarantee that this recognition can never lie.

The way to recognize is the deduplication key Deduplication key The identifier a consumer uses to recognize a message it has already processed. It is often the stable message id provided by the broker, sometimes a business key (an order number). It is stored in an inbox table: a redelivered message whose key is already present is discarded. It answers the question: is this the same message? Source: Kleppmann, 2017 . It is the identity the consumer keeps to answer the question: have I already processed this message? Two natural candidates. The simplest, the stable message id, set by the producer and kept across all redeliveries: two copies of the same message carry the same id. The most robust, a business key: the order number, the payment transaction id. The business key has a subtle advantage: it deduplicates even two technically different messages that describe the same business fact.

The consumer stores these keys in a small table, often called an inbox. Before acting, it looks: if the key is already there, it has already processed this message, it discards it and acknowledges without redoing anything. Otherwise, it processes, then records the key.

And here lies the real trap, the one chapter three left hanging. Look at this naive, two-step version:

1. apply the effect       (charge 49 euros)
2. record the key in the inbox

What happens if the machine shuts down between step 1 and step 2? The effect happened, but the key is not recorded. On redelivery, the consumer checks its inbox, does not find the key, and charges a second time. We have written “idempotent” code that is not: we have simply reopened the two generals problem inside our own database.

As long as the effect lives in the same database as the inbox, this is feasible: one transaction covers both. When the effect leaves the database (calling an external payment API), the problem hardens, and we fall back on heavier strategies that this chapter keeps for later. Keep the essential: the key answers “is this the same message”, and atomicity answers “can I trust my answer”.

The second wound: ordering

The idempotent consumer neutralizes duplicates. It does not touch ordering. Take our order again: three facts follow one another, “created”, then “paid”, then “shipped”. If redelivery makes “shipped” arrive before “paid”, a perfectly idempotent consumer will ship an unpaid order, without the slightest alert. Idempotence checks “have I already seen this message”, never “is this the right moment for this message”.

Where does disorder come from? From two sources. Redelivery, which reinjects a late message after the following ones have already passed. And above all parallelism: to go fast, we have a single queue consumed by several workers of a consumer group. Two messages sent in order can be processed out of order, simply because one worker is slower than the other.

The solution has a precise name. We split the stream into partitions Partition A sub-stream of a message log that holds an ordered subset of the messages. A log is split into several partitions so that consumers can read them in parallel. Ordering is guaranteed only within a single partition, never across partitions. Each message is assigned to a partition by its partition key. Source: Kleppmann, 2017 , sub-streams each of which keeps its messages in order. The golden rule: ordering is guaranteed only within a partition, never across partitions. This is what we call a partial order Partial order The guarantee that messages are ordered only within each partition, not across the whole log. A log offers a partial order, not a total order: two messages in the same partition keep their relative order, but two messages in different partitions have no defined order between them. This is why you need a key that groups causally related messages into the same partition. Source: Kleppmann, 2017 , as opposed to a total order that would line up all the system’s messages on a single row. A total order would kill parallelism: one message at a time, for everyone. Partial order is the compromise that saves both consistency and throughput.

It remains to decide which messages land in which partition. That is the role of the partition key Partition key The value used to route a message to a partition, usually by hashing it. Two messages with the same partition key always land on the same partition, so they stay ordered relative to one another; messages with different keys spread across partitions and are processed in parallel. Choosing it well (for example the order id) buys per-key ordering without sacrificing throughput. It answers the question: which messages must stay ordered together? Source: Apache Kafka, Documentation . All messages that share the same key fall into the same partition, so they stay ordered relative to one another; different keys spread across different partitions and are processed in parallel. For our commerce, the right key is obvious: the order id. All events of order number 42 go into the same partition and keep their order; order 42 and order 43 live in separate partitions and move forward side by side.

The partition key routes one order's messages to the same partition: order is preserved there, and two distinct orders stay processed in parallel.

If the partition key answers “which messages must stay ordered together”, the deduplication key answered “is this the same message”. Two keys, two questions, two wounds healed. It is time to see them at work together.

Watch the two ramparts

The component below takes an order whose events were delivered by an at-least-once broker: out of order (shipped before paid) and with a duplicate (paid twice). You turn on, or not, each of the two ramparts, and you watch the final state.

Start with nothing: the order is doubly damaged, double charge and shipped before being paid. Turn on deduplication alone: the duplicate disappears, but the order stays shipped before payment. This is the central lesson, idempotence does not repair ordering. Turn on partition ordering alone: the sequence becomes coherent again, but the double charge comes back. Only with both ramparts raised together is the order finally correct.

Ordering and idempotency

The broker delivers one order's events out of order and with a duplicate. Turn on deduplication, partition ordering, or both, and watch the order's final state.

What the broker delivers (at-least-once)

CreatedShippedPaidPaid

What the consumer processes

CreatedShipped(out of order)PaidPaid
Charges2xFinal statePaidShipped before paidCorrupt

Three questions to ask yourself while playing:

  • With deduplication alone on, the double charge disappears but the verdict stays “Disordered”. What does deduplication never look at?
  • With partition ordering alone, the sequence is put back in the right order, but the charges show two. Why does putting things back in order not remove the duplicate?
  • What is the only combination that shows “Correct”, and what does it say about the relationship between the two ramparts?

In code: Wolverine in one line, Hexeract in full honesty

Let us see these ideas in real code. And the lesson is instructive: depending on the framework, either the tool hands you both ramparts almost for free, or it gives you just the foundations and honestly names what is left to you.

Let us start with the case where everything is provided. Wolverine, on a Kafka transport, gives ordering by key and deduplication each in one line. Ordering first: we attach a partition key to each message, and Kafka guarantees that all messages with the same key stay ordered.

// All messages of one order share the partition key,
// so they land on the same partition and stay ordered.
await bus.PublishAsync(
    new OrderShipped(orderId),
    new DeliveryOptions { PartitionKey = orderId.ToString() });

Deduplication next. The durable inbox persists each received message before committing the offset, which gives at-least-once delivery; and the same persistence serves to recognize an already-processed id and discard it. This is exactly this chapter’s idempotent consumer, atomicity included, folded into one call.

// Persists before committing the offset (at-least-once)
// AND deduplicates by message id: effectively-once.
opts.ListenToKafkaTopic("orders.charge-payment")
    .UseDurableInbox();

Hexeract, the Rust framework that serves as our common thread, is more raw, and that is precisely what makes it pedagogical: it does not hide what it does not do. On ordering, its exchange-kind enum offers no partitioning by key, and the comment of one of its fields warns in plain text that dispatch ordering is at best best-effort, never guaranteed, because several workers can relay in parallel. On deduplication, Hexeract provides no ready-made inbox: it propagates a stable message id and documents that handlers must be idempotent.

// Hexeract gives you the key, not the store.
// raw_publish.rs, verbatim:
// "propagating a stable message_id lets consumers deduplicate on it"
// handler.rs, the contract:
// "Handlers MUST be idempotent because the same message can arrive more than once"

Two philosophies, one same lesson. Wolverine fills the contract for you; Hexeract hands you the tools and names the contract. In both cases, effectively-once is never magic: it is at-least-once, plus a deduplication, plus a well-chosen partition key. No framework spares you from choosing your two keys.

Exercises

Take a sheet of paper and a pencil. The solutions are right below, to look at only after trying.

Exercise 1: choosing the partition key

Your commerce system processes the events of thousands of orders, placed by hundreds of customers. For each order, the “created, paid, shipped” order must be respected. You are offered three possible partition keys: (a) a constant key, the same for all messages; (b) the customer id; (c) the order id. For each, say whether per-order ordering is respected, and what happens to parallelism. Conclude with the best choice.

Exercise 2: idempotence without atomicity

A worker deduplicates with an inbox, but its code does things in two steps: first it calls the payment API to charge 49 euros, then, in a separate instruction, it inserts the message id into its inbox table. The machine shuts down right after the charge, before the insertion. Describe what happens on redelivery of the message. How many times are the 49 euros charged? Then propose the fix in one sentence.

In one sentence

At-least-once delivery inflicts two distinct wounds, the duplicate and disorder: we neutralize the duplicate with an idempotent consumer that records its deduplication key atomically with the effect, and we neutralize disorder with a partial order obtained by routing through a partition key the messages that must stay ordered together.

Quiz
  1. 1. Why can a perfectly idempotent consumer still corrupt an order?

  2. 2. What makes a consumer truly idempotent?

  3. 3. What is the partition key for?

Towards the next chapter

We have hardened the consumer: it deduplicates with its key, it respects partition ordering, and the atomicity of the inbox makes its idempotence true. But all this fine machinery rests on a hidden assumption on the producer’s side. We said “the producer publishes a message when the business state changes”. But how does it guarantee that the two happen together? Imagine: the service records the order in its database, commits the transaction, then tries to publish “order created” on the broker, and that is when the network drops it. The database says “order created”, the broker never received the message, and no redelivery will save it: it never existed. The mirror case threatens us too, publishing then failing to commit the database. Writing the business state and emitting the message in the same transaction, that is the last lock. It is the transactional outbox that chapter five builds.

Sources

  • Kleppmann, M. (2017). Designing Data-Intensive Applications, chapter 11 “Stream Processing” (partitioning, ordering by key, idempotent processing). O’Reilly. Publisher reference
  • Hohpe, G. & Woolf, B. (2003). Enterprise Integration Patterns, “Idempotent Receiver” pattern. Addison-Wesley. Pattern catalog
  • Apache Kafka. Documentation: Design, Message Ordering and Partitioning. kafka.apache.org
  • Wolverine. Kafka Transport: Partition Keys and Durable Inbox. Official documentation. Kafka transport, Durability