How to design a real-time chat system

· Category: System Design

Short answer

A real-time chat system delivers messages instantly between users using persistent connections and message brokers.

Steps

  1. Establish persistent connections using WebSockets or long polling for browser clients.
  2. Authenticate connections and maintain user presence status.
  3. Route messages through a message broker or pub-sub system.
  4. Store messages durably for offline users and chat history.
  5. Deliver messages to all participants in a conversation.

Tips

  • Use connection gateways to manage millions of concurrent sockets.
  • Implement message ordering and deduplication guarantees.
  • Support typing indicators, read receipts, and presence as separate lightweight events.
  • Shard conversations to distribute load.

Common issues

  • Connection drops causing missed messages during reconnection.
  • Message ordering violations in distributed brokers.
  • Presence storms when many users change status simultaneously.
  • Storage costs growing with unlimited message history retention.

Example

# Consistent hashing for service discovery
import hashlib

def get_node(key, nodes):
    hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
    return nodes[hash_val % len(nodes)]

node = get_node('user-123', ['node-a', 'node-b', 'node-c'])

This snippet implements consistent hashing to distribute keys across nodes, a foundational technique in scalable distributed systems.

Additional context

Applying these principles consistently across projects leads to more maintainable systems, clearer team communication, and better outcomes for end users. Regular review and refinement of practices ensure continuous improvement.