Most candidates overcomplicate designing a chat system

Why messaging systems become hard only after the happy path works

Jun 03, 2026

I’ve seen many candidates start chat System Design interviews with the same confident first move. They draw users, a load balancer, WebSocket servers, a database, maybe Kafka, and then say messages flow from one user to another in real time.

That is not wrong.

It is just incomplete.

Chat systems look simple because the product experience is simple. One person types a message. Another person sees it. The hard part begins when users go offline, switch devices, send messages during network drops, receive duplicates, open the same conversation from laptop and phone, expect unread counts to remain accurate, and still want the system to feel instant.

A chat system is not just a real-time transport problem.

It is a consistency, ordering, storage, and delivery problem wearing a simple UI.

The mistake most candidates make immediately

The most common mistake is treating chat as a WebSocket interview.

WebSockets matter, but they are not the system. They are only the live delivery channel.

If Alice sends Bob a message, the system must persist it durably, assign it an order, deliver it to Bob’s active sessions, retry delivery if Bob is offline, update unread counts, sync across Bob’s devices, and eventually support search, moderation, attachments, reactions, edits, deletes, and read receipts.

That means the core of the design is not “open a socket and push the payload.”

The core is deciding what the source of truth is for messages and how every client eventually converges to that truth.

Weak designs optimize for the fastest live push path. Strong designs treat live delivery as an optimization on top of durable storage.

Start with the message lifecycle

A good chat design begins with the lifecycle of a message.

A client creates a temporary local message ID and sends the message to the backend. The backend authenticates the user, validates membership in the conversation, stores the message durably, assigns a server-side sequence number, publishes a delivery event, and returns an acknowledgement to the sender. Online recipients receive the message through WebSocket connections. Offline recipients fetch it later through sync APIs.

That flow matters because it separates message acceptance from message delivery.

This sequence also prevents a dangerous design mistake. If the system pushes messages before persistence succeeds, users may briefly see messages that later disappear. That may be acceptable in some casual systems, but it is usually a bad default because users treat chat history as durable.

Message ordering is harder than it looks

Candidates often say messages should be ordered by timestamp.

That sounds reasonable until two users send messages at nearly the same time from different regions, one client’s clock is wrong, and one request arrives late because of mobile network jitter.

Client timestamps are useful metadata, but they should not define canonical order.

A stronger design assigns a monotonically increasing sequence number per conversation when the message is accepted by the server. This gives every conversation a stable ordering independent of client clocks.

Per-conversation ordering is usually enough because users care about order inside a chat thread. They do not need a globally consistent order across unrelated conversations.

This is an important simplifying insight.

Many candidates make systems harder by trying to solve unnecessary global ordering problems. In distributed systems, the cleanest design often comes from narrowing the consistency boundary.

WebSocket servers should stay mostly stateless

Chat systems need persistent connections, so WebSocket servers become central to the architecture. However, those servers should not own durable message state.

Their job is connection management.

A WebSocket server tracks active sessions, authenticates connection tokens, receives client events, forwards messages to backend services, and pushes server events to connected clients. If the server dies, clients reconnect elsewhere and resume from the last acknowledged sequence number.

That reconnection model is critical.

The system should assume WebSocket connections are temporary. Mobile clients sleep. Browsers refresh. Networks drop. Load balancers drain connections. Servers restart.

If the WebSocket server holds important unrecoverable state, the system becomes fragile.

The durable state should live in message storage and conversation metadata. The WebSocket layer should behave like a delivery pipe, not the source of truth.

The online path and the offline path are different systems

Real-time delivery and offline synchronization should be designed separately.

When Bob is online, the system pushes Alice’s message to Bob immediately. When Bob is offline, the message remains in durable storage, and Bob’s client fetches it later using a sync API.

Trying to make push delivery responsible for offline correctness creates unnecessary complexity.

This distinction helps explain chat reliability clearly in interviews.

Live push is the best effort. Durable sync is authoritative.

That model is how many real messaging systems remain operationally understandable. If a push fails, the message is not lost. The recipient catches up later by asking, “Give me all messages after sequence number 1042.”

Storage design depends on conversation shape

The storage layer should be optimized around conversation reads and append-heavy writes.

A typical message table or distributed storage model might partition by conversation ID and sort by message sequence. That makes it efficient to fetch recent messages for a conversation and paginate backward through history.

For small systems, a relational database can work well. For very large systems, wide-column stores or distributed databases become more attractive because chat messages are append-heavy and naturally partitioned by conversation.

The important part is access pattern clarity.

Most chat reads ask for recent messages in one conversation. Most writes append new messages. Designing around that pattern is more useful than choosing a fashionable database prematurely.

Group chat changes the fanout problem

One-to-one chat is relatively straightforward. Group chat introduces fanout.

If Alice sends a message to a group with five users, the system can deliver it directly to each active user. If Alice sends a message to a group with one million users, direct fanout becomes expensive.

That difference creates an important design split.

Fanout on write means the system creates delivery records or updates inboxes for every recipient when the message is sent. This makes reads fast but writes expensive.

Fanout on read means the system stores the message once and lets recipients fetch it when they open the conversation. This makes writes cheap but reads more expensive.

Strong candidates discuss this trade-off instead of pretending one approach works for every chat shape.

Read receipts and unread counts are deceptively difficult

Unread counts sound simple until multiple devices enter the picture.

A user reads messages on their phone, then opens the desktop app. The desktop should not show those messages as unread. Another device may be offline and later reconnect with stale read state. Group chats may need per-user read positions. Large channels may not support exact read receipts for every member because the write amplification becomes too high.

A clean design stores each user’s read cursor per conversation.

Instead of marking every message as read individually, the system records the highest sequence number the user has read.

Unread count can then be approximated or computed as the difference between the latest conversation sequence and the last read sequence, with adjustments for deletes or filtered messages if needed.

This approach is much simpler than storing per-message read rows for every user in every conversation.

Again, the best design is often the one that avoids unnecessary per-message explosion.

Presence should not be treated as perfectly accurate

Presence is another area where candidates often overpromise.

Users expect to see online, offline, typing, or last seen indicators. These signals are useful, but they are not financial records. They can be eventually consistent and approximate.

A presence service can track active WebSocket connections with heartbeats. If a client stops sending heartbeats, the system eventually marks the user offline. Typing indicators can be sent as ephemeral events that expire quickly.

The mistake is storing every presence update durably or trying to make presence globally consistent.

Presence is a soft state.

If it is slightly stale, the product still works. If the message storage is wrong, the product loses trust. Good System Design separates these reliability levels clearly.

Idempotency protects against duplicate sends

Mobile clients retry aggressively. Users double-tap send. Network timeouts hide successful writes.

Without idempotency, the same message may appear twice.

The client should generate a unique client message ID for each send attempt. The backend should store that ID scoped to the conversation and sender. If the same client message ID arrives again, the server returns the already-created message instead of creating a duplicate.

This is a small detail, but it matters.

Many chat duplicates in real systems come from retry behavior rather than obvious bugs. Idempotent send handling makes retries safe.

Attachments should not flow through the message service

A common design mistake is sending large files through the same path as text messages.

Attachments should usually go through object storage.

The client requests an upload URL, uploads the file directly to storage, and then sends a chat message containing a reference to the uploaded object. The message service stores metadata, not the entire file payload.

This keeps the message write path lightweight.

This design also allows asynchronous virus scanning, thumbnail generation, video transcoding, and content moderation without slowing down normal message delivery.

Search and moderation should be asynchronous

Users eventually want to search. Platforms eventually need moderation.

Neither should block the main send path unless the product has strict safety requirements requiring pre-send enforcement.

A practical system emits message-created events into a stream. Search indexers consume those events and update the search infrastructure. Moderation systems can scan content asynchronously and mark messages for review, removal, or restricted visibility.

This creates eventual consistency.

A newly sent message may not appear in search instantly. That is usually acceptable.

The key is being explicit about which user experiences require immediate correctness and which can tolerate delay.

Observability matters because chat failures are subtle

Chat systems often fail in ways users notice emotionally before metrics look catastrophic.

A message arrives late. A read receipt does not update. A typing indicator freezes. One device shows a message while another does not. A group conversation appears out of order.

These problems damage trust quickly.

The system needs strong observability around send latency, WebSocket disconnects, message persistence errors, delivery lag, sync failures, queue depth, fanout delays, and unread count drift.

This is where production experience changes the design conversation.

A chat system is not only judged by whether messages eventually arrive. It is judged by whether users feel the conversation is reliable.

Scaling should follow measured pressure

I generally distrust chat designs that start with global active-active replication, five queues, three databases, and fully distributed presence before the core message lifecycle is clear.

Start with a single-region architecture that has durable message storage, stateless WebSocket servers, sequence-based sync, and an event bus for fanout. Then scale the parts that show measurable pressure.

If WebSocket connections grow, add more connection servers and shard users across them. If message writes grow, partition by conversation ID. If group fanout grows, introduce hybrid fanout strategies. If the search grows, scale indexing independently. If global latency becomes painful, introduce regional connection edges while keeping conversation ownership clear.

The architecture should evolve from the workload.

That is usually the difference between a design that sounds impressive and a design that could survive production.

What interviewers are actually looking for

Chat System Design interviews are rarely about inventing WhatsApp in forty-five minutes.

They are evaluating whether you understand durable messaging under unreliable networks.

Can you separate persistence from delivery?
Can you reason about ordering?
Can you handle offline users?
Can you avoid duplicate messages?
Can you scale group fanout?
Can you explain eventual consistency without hiding behind buzzwords?

Those questions matter more than drawing a large number of boxes.

Strong candidates usually sound practical. They do not pretend that WebSockets solve everything. They explain failure recovery, sequence numbers, reconnection, idempotency, and storage access patterns clearly.

Final thoughts

Chat systems are deceptively difficult because the user experience hides the distributed systems’ complexity underneath.

The interface is simple. The guarantees are not.

A good design treats messages as durable facts, WebSockets as temporary delivery channels, sequence numbers as ordering boundaries, sync APIs as the recovery mechanism, and asynchronous pipelines as support systems for search, moderation, analytics, and notifications.

That mindset keeps the architecture understandable.

Most candidates fail chat System Design interviews by optimizing the live path and ignoring recovery. In production, recovery is the system. Networks drop, clients retry, devices reconnect, and users still expect their conversations to remain correct.

That is why chat System Design is such a useful interview problem. It reveals whether someone understands that real-time systems are not just about speed. They are about preserving trust when the network stops cooperating.

Discussion about this post

Ready for more?