Broker versus log
The previous chapter treated the mailbox as a black box. We open it. There are two ways to build it, and the whole difference comes down to one question: who remembers what has already been read?
In the previous chapter, we learned to name what we send each other: commands, queries, events. But we treated the “mailbox” that carries them as a black box that accepts drop-offs and hands them out. Take the scenario again: your order service publishes the event “Order paid”. What becomes of that message once dropped off? Is it handed to a reader then destroyed, or kept and re-readable later? And who, the system or the reader, remembers what has already been read? The answer splits the messaging world into two families of tools, and this chapter opens the box to compare them.
The mailman and the ledger
In chapter one, we sketched the image in a single sentence. Unfold it now.
Picture a mailman first. You hand him a letter, he carries it to a recipient, the recipient signs, and the mailman throws away his copy: his job is done, he keeps no trace. If someone else wanted to read that letter tomorrow, it is too late, it was delivered once then forgotten. The mailman only knows what he still has to deliver; once delivered, nothing remains.
Now picture a large bound ledger lying on a table. Every message that arrives is copied onto the next numbered line, and nothing is ever erased. Anyone can sit down, put a finger on a line, read downward at their own pace, and go back up if they wish. Two readers move independently: one may be on line 40, the other still on line 12. The ledger itself does not remember who read what: it is each reader who remembers where they are.
These two images are not decoration: they are the two great architectures of messaging. The mailman is the message broker Message broker An intermediary that receives messages, holds them in queues and hands each one to a consumer, then erases it once it has been acknowledged. The archetype is RabbitMQ: a delivered and acknowledged message is gone, it is not kept to be re-read. The progress state (what is left to deliver) lives in the broker, not in the reader. Source: Hohpe & Woolf, 2003 , whose archetype is RabbitMQ. The ledger is the message log Message log An ordered, append-only sequence of messages kept instead of being erased after reading. The archetype is Kafka: each message gets a fixed position, and several readers can re-read it independently, each at its own pace. Unlike the broker, the read state does not live in the log but in the consumer, as an offset. Source: Kleppmann, 2017 , whose archetype is Kafka. The whole chapter is about understanding why this choice of image deeply changes what the system can do.
The broker: deliver then forget
Let us start with the mailman. A broker manages queues. You publish a message, it files it into a queue, and it hands it to a consumer plugged into that queue. As soon as the consumer confirms it received and processed it, the broker removes the message from the queue. It is gone. No one will re-read it.
What happens when several consumers plug into the same queue? The broker serves them in turn: the first message to one, the next to another, and so on. Each message therefore goes to a single consumer, and the work splits itself among them. This is the competing consumers pattern: to process faster, you add consumers on the same queue, and the broker balances the load. It is exactly what you want for a pool of workers charging payments: each payment must be processed once and only once, by any one of them.
The crucial point lies elsewhere. Where does the information “is this message still to be processed?” live? It lives in the broker. It holds the queue, it knows what remains, it erases after confirmation. The consumer keeps nothing: it receives, processes, confirms, forgets. A direct and heavy consequence: a consumer that arrives too late will never see the messages already delivered and erased. They no longer exist anywhere.
The log: keep and let re-read
Now the ledger. A log is a sequence of messages written one after another, and never erased: you only ever append to the end. This is what we call an append-only structure. Each message, as it enters, receives a permanent, increasing position number: the first message is at position 0, the next at 1, and so on.
How does a consumer read this log? It keeps a cursor: the number of the next message it plans to read. This cursor has a name, the offset Offset A consumer's read position in a message log: the number of the next message it will read. The consumer owns and advances its own offset, not the log. Two readers of the same log therefore have independent offsets, and rewinding an offset to an earlier position is enough to re-read history. Source: Kleppmann, 2017 , and here is the pivot of the whole chapter: the offset belongs to the consumer, not to the log. The log does not know, and need not know, who has read up to where. It is each reader who holds its offset and advances it as it reads.
From there follows everything that sets the log apart from the broker. Because the offset lives in the reader and the message stays, a new reader can simply reset its offset to 0 and re-read the whole history from the start. And several independent readers can read the same log without getting in each other’s way, each at its own position. To organize these readers, they are gathered into consumer groups Consumer group A set of consumers that share a single offset to split the reading of a log: within the group, each message is processed only once. Several distinct groups read the same log independently, each with its own offset, so a message is re-read as many times as there are groups. It is the log-side counterpart of a broker's competing consumers. Source: Kleppmann, 2017 : inside a group, members share the work and each message is processed only once, exactly like a broker’s competing consumers; but two distinct groups each have their own offset, so the same message is read once by each group. Three groups plugged into the log means three complete and independent readings of the same stream.
One honest nuance remains: “never erased” has a practical limit. A log keeps messages for a duration, or up to a size, that you configure: this is retention. As long as retention covers the period you want, you can re-read; beyond that, old messages eventually get recycled. Replay is therefore possible, but bounded by what you chose to keep.
The pivot: who holds the read position?
We can now put the two worlds face to face. The real difference is not “a queue versus a sequence of lines”: it is where the read position lives, and everything else follows.
| Criterion | Broker (queue) | Log |
|---|---|---|
| Who holds the read position | The broker | The consumer (its offset) |
| Fate of a read message | Erased after acknowledgement | Kept, re-readable |
| The same message can be read by | A single consumer | As many groups as you want |
| Several consumers on the stream | Share it (competing consumers) | Share within a group, several groups in parallel |
| Re-read history | Impossible (erased) | Possible (rewind the offset), bounded by retention |
| A late reader sees the past | No | Yes, as long as retention covers it |
| Archetype | RabbitMQ | Kafka |
Read the third column as a cascade: because the offset lives in the consumer and the message stays, you can re-read, you can rewind, you can plug in as many groups as you like. And read the second the same way: because the broker holds the position and erases, it serves fast and simply a workload to spread out, but it cannot go back. Neither is “better”: they answer two different needs, which we will learn to recognize.
See the pivot yourself
The component below runs the same stream of messages through both models, side by side. Produce a few messages, then play.
On the broker side, deliver the messages to competing consumers and watch the queue empty: each message goes to a single consumer, then disappears. On the log side, advance the groups’ offsets and watch that nothing is erased. Finally, the revealing move: add a late reader. On the broker side, it faces an empty queue, it has nothing to read. On the log side, it starts from offset 0 and re-reads the whole history. That is the whole difference, in one button.
Produce messages, then deliver them on the broker side and advance the offsets on the log side. To finish, add a late reader and compare what it sees on each side.
Broker (queue)
Queue
empty queue: nothing to deliver
Competing consumers
- A0 handled
- B0 handled
Log
empty log
- billingoffset: 0caught up
- loyaltyoffset: 0caught up
Three questions to ask yourself while playing:
- After delivering everything on the broker side, add a consumer then click “Deliver the next”. Why does it receive nothing, while a new group on the log side can re-read everything?
- Advance the offset of the “billing” group without touching “loyalty”. What does this show about the independence of the two offsets?
- On the broker side, your consumers’ “handled” counter rises, but the queue empties. On the log side, the offsets advance but the number of messages never drops. Which property of each model does this asymmetry capture?
In code: a broker at heart, a log facing it
Let us see how these two models appear in real code. And here a useful admission is in order.
Hexeract, the Rust messaging framework that serves as our running thread, is a broker at heart. Its bus builds on RabbitMQ: you publish a message to a queue, a worker consumes it, acknowledges it, and the message disappears. Several workers on the same queue form competing consumers, and RabbitMQ spreads the work among them.
// Publish: the message goes to a queue, under a routing key.
let transport = RabbitMqTransport::new(&amqp_url).await?;
transport
.publish("orders.order-paid", &OrderPaid { order_id })
.await?;
// Consume: a worker reads the queue and processes each message.
// Running several workers on the SAME queue is competing consumers:
// RabbitMQ serves them in turn, each message to a single one.
let worker = RabbitMqWorkerBuilder::new(connection)
.queue("orders.order-paid")
.register_handler::<OrderPaid, _>(NotifyShippingHandler)
.build()?;
worker.run(cancel).await?;
Look for an offset, a consumer group, a replay in Hexeract: you will not find one, and that is deliberate. An acknowledged message is removed, full stop. Far from a gap, it is proof in code of what this chapter says: a broker erases, it does not keep. To get a log, you need another model, not one more option on the same one.
Wolverine, its .NET counterpart, is instructive because it knows how to speak to both worlds. With a RabbitMQ transport, you set the number of competing consumers on a queue. With a Kafka transport, you join a consumer group and choose where to place the offset, including at the very beginning to re-read.
// Broker side (RabbitMQ): several consumers share the queue.
opts.ListenToRabbitQueue("orders.order-paid")
.ListenerCount(5); // five competing consumers on the same queue
// Log side (Kafka): a group reads the topic and holds its offset.
// AutoOffsetReset.Earliest says: if this group is new, start from the
// beginning and re-read all the retained history.
opts.ListenToKafkaTopic("orders.order-paid")
.ConfigureConsumer(config =>
{
config.GroupId = "fraud-detection";
config.AutoOffsetReset = AutoOffsetReset.Earliest;
});
Two lines capture the pivot: ListenerCount adds consumers that share a queue the broker empties; GroupId plus AutoOffsetReset.Earliest creates a reader that holds its offset and can rewind. The same “Order paid” message, depending on the box you choose, is either delivered once then forgotten, or kept and re-read by as many groups as you want.
Exercises
Grab a pen and paper. The solutions are right below, to look at only after you have tried.
Exercise 1: choose the right box
For our e-commerce order, two distinct needs. (a) A pool of workers charges payments: each payment must be processed once and only once, by any free worker. (b) The “Order paid” event must be received by three services that do different things with it: billing, loyalty, and analytics; and each must be able to replay the events after a crash. For each need, say whether a broker or a log fits better, and justify in one sentence based on the read position.
Exercise 2: the service that arrives late
A new fraud-detection service must analyze all payments from the last seven days, including those that happened before its deployment. First question: with a broker, why is this simply impossible? Second question: with a log, how do you do it concretely, and on which setting does success depend?
In one sentence
A broker delivers each message to a single consumer then erases it, holding the read position itself; a log keeps the messages and lets each consumer hold its own offset, which allows several independent readers and the replay of history.
1. What is the real, fundamental difference between a broker and a log?
2. Why do three consumer groups plugged into the same log each read the whole stream, while three competing consumers on a broker queue share it?
3. A new service must process messages produced before it started. What can we say?
Towards the next chapter
We now know what becomes of a message: erased after acknowledgement at the broker, kept and re-readable at the log. But we glossed over a word heavy with consequences: acknowledgement. Go back to the mailman. Does he acknowledge the letter the moment he hands it to you, or only after you have signed? If he throws away his copy as he hands it to you and you drop it in a puddle, it is lost: no one will replay it. If he waits for your signature and lightning strikes just after you sign but before he notes it, he will come back tomorrow to deliver the same letter: you will receive it twice. Lose once, or receive twice: this dilemma is not a technical detail, it is the choice of delivery semantics. Chapter three puts the moment of acknowledgement under the microscope and shows which guarantees, at-most-once, at-least-once, effectively-once, you can really hold.
Sources
- Hohpe, G. & Woolf, B. (2003). Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley. Patterns “Message Broker”, “Competing Consumers”, “Point-to-Point Channel” and “Publish-Subscribe Channel”. Publisher reference
- Kleppmann, M. (2017). Designing Data-Intensive Applications, chapter 11 “Stream Processing” (classic message brokers versus logs). O’Reilly. Publisher reference
- Apache Kafka. Documentation: Consumers, Consumer Groups, Offsets. kafka.apache.org
- RabbitMQ. Consumers and Acknowledgements. rabbitmq.com
- Wolverine. Kafka Transport and RabbitMQ Listeners. Official documentation. Kafka, RabbitMQ