Skip to content
Back to all notes

MQTT topic design: rules that survive contact with a real fleet

Topic structures look tidy on a whiteboard and embarrassing six months into production. A short list of design rules I keep coming back to, after running several MQTT backends in the wild.

· 2 min read · 444 words ·

The first MQTT design meeting almost always converges on something like devices/{deviceId}/state. It’s clean. It’s wrong in subtle ways that show up six months later, usually right after the first firmware migration. Here is the short list of rules I keep coming back to.

1. Version the topic, not just the payload

devices/{deviceId}/state/v1
devices/{deviceId}/state/v2

The day you change the payload shape, you do not want to coordinate firmware on 10,000 devices with backend deploys on a single shared topic. Subscribers self-select the versions they understand; old devices keep talking on v1 until they are upgraded.

2. Keep direction in the topic

devices/{deviceId}/state/v1        # device → backend
devices/{deviceId}/command/v1      # backend → device

Two topic trees, one for each direction. Resist the temptation to put both into one. Authorization becomes trivial (devices publish here, subscribe there), debugging is faster, and a misbehaving device can be muted in one direction without losing the other.

3. The leading segment is not the device ID

v1/tenant/{tenant}/device/{deviceId}/state

Counter-intuitive, but tenant-first paths are friendlier to ACLs, multi-tenant routing, and partitioning if you later move to a managed broker that supports sharding. {deviceId} is the last discriminator, not the first.

4. Plan for retained messages from day one

Some state belongs on a retained topic (current device configuration, latest known state). Some explicitly must not (events, alerts, telemetry samples). Decide for each topic before the first device ships. Retroactively turning a retained topic into a non-retained one is a migration; the other direction is a long replay.

5. Idempotent message handlers, always

// Topic: devices/<id>/state/v1
export const onState = async (msg: StateMessage) => {
  const parsed = StateSchema.safeParse(msg);
  if (!parsed.success) return drop(msg, 'invalid');

  await db.device_state.upsert({
    device_id: parsed.data.id,
    last_seen: new Date(),
    payload: parsed.data,
  });
};

Brokers redeliver. Devices reboot mid-publish. The same state message will arrive twice — sometimes more. upsert plus a monotonic sequence number in the payload removes an entire class of edge-case bugs.

6. Treat the broker as ephemeral

The broker is for delivery, not for storage. Anything you might want to query in the future — every state change, every alert, every telemetry sample — lands in a real database (TimescaleDB does this well) inside your handler. If a broker dies tomorrow, you should lose nothing of value.

A simple test

If your topic scheme cannot survive these three operations without surgery, redesign it before the first 1,000 devices ship:

  1. A payload field needs to change shape.
  2. A second tenant joins.
  3. One firmware variant needs different commands than another.

Two hours on the whiteboard now beat six weeks of migration later.

Contact

Sound familiar?

Send me a short note about what you are dealing with. I will respond within 24 hours with an honest assessment — even if I am not the right partner.