MQTT topic design: rules that survive contact with a real fleet
Topic structures look tidy on a whiteboard and embarrassing six months into production. A short list of design rules I keep coming back to, after running several MQTT backends in the wild.
The first MQTT design meeting almost always converges on something like devices/{deviceId}/state. It’s clean. It’s wrong in subtle ways that show up six months later, usually right after the first firmware migration. Here is the short list of rules I keep coming back to.
1. Version the topic, not just the payload
devices/{deviceId}/state/v1
devices/{deviceId}/state/v2
The day you change the payload shape, you do not want to coordinate firmware on 10,000 devices with backend deploys on a single shared topic. Subscribers self-select the versions they understand; old devices keep talking on v1 until they are upgraded.
2. Keep direction in the topic
devices/{deviceId}/state/v1 # device → backend
devices/{deviceId}/command/v1 # backend → device
Two topic trees, one for each direction. Resist the temptation to put both into one. Authorization becomes trivial (devices publish here, subscribe there), debugging is faster, and a misbehaving device can be muted in one direction without losing the other.
3. The leading segment is not the device ID
v1/tenant/{tenant}/device/{deviceId}/state
Counter-intuitive, but tenant-first paths are friendlier to ACLs, multi-tenant routing, and partitioning if you later move to a managed broker that supports sharding. {deviceId} is the last discriminator, not the first.
4. Plan for retained messages from day one
Some state belongs on a retained topic (current device configuration, latest known state). Some explicitly must not (events, alerts, telemetry samples). Decide for each topic before the first device ships. Retroactively turning a retained topic into a non-retained one is a migration; the other direction is a long replay.
5. Idempotent message handlers, always
// Topic: devices/<id>/state/v1
export const onState = async (msg: StateMessage) => {
const parsed = StateSchema.safeParse(msg);
if (!parsed.success) return drop(msg, 'invalid');
await db.device_state.upsert({
device_id: parsed.data.id,
last_seen: new Date(),
payload: parsed.data,
});
};
Brokers redeliver. Devices reboot mid-publish. The same state message will arrive twice — sometimes more. upsert plus a monotonic sequence number in the payload removes an entire class of edge-case bugs.
6. Treat the broker as ephemeral
The broker is for delivery, not for storage. Anything you might want to query in the future — every state change, every alert, every telemetry sample — lands in a real database (TimescaleDB does this well) inside your handler. If a broker dies tomorrow, you should lose nothing of value.
A simple test
If your topic scheme cannot survive these three operations without surgery, redesign it before the first 1,000 devices ship:
- A payload field needs to change shape.
- A second tenant joins.
- One firmware variant needs different commands than another.
Two hours on the whiteboard now beat six weeks of migration later.