Real-time collaboration—multiple users editing the same document with live cursors and updates—is expected in 2026. From Google Docs and Notion to Figma and code editors, users expect to see each other’s changes within milliseconds. This guide walks through the core requirements, the two main consistency approaches (OT and CRDTs), and how to scale the transport and persistence layers. It fits alongside other system design topics like designing a rate limiter for protecting your collaboration API and designing a distributed cache for session and presence data.
Requirements
Before choosing algorithms and infrastructure, lock in what you need:
Low latency — Typing and cursor moves should feel instant. End-to-end latency from “user A types” to “user B sees” should be under 100–200 ms for a good experience. That constrains where you run logic (edge vs central) and how you encode and transmit operations.
Consistency — All users must eventually see the same document state. When two users edit the same paragraph, the result must be deterministic and mergeable. No conflicting edits that leave the document in an undefined state. The two main ways to achieve this in 2026 are Operational Transformation (OT) and CRDTs.
Presence — Users expect to see who else is in the doc, where their cursors are, and optionally what they have selected. Presence is separate from content: it is high-frequency (cursor moves often), can be throttled, and is often sent on a different channel or at lower priority so it does not block content ops.
Persistence — The document must be saved and recoverable. You need snapshots (full state at a point in time) and an operation log so you can replay and reconstruct history. Periodic snapshots reduce the cost of replay; compaction keeps the log from growing forever.
Scale — You must support many documents, many users per document, and many concurrent connections. That implies sharding (by document or by connection), horizontal scaling of WebSocket servers, and a pub/sub or coordination layer so that all clients editing the same doc receive the same stream of ops.
If you are building APIs that need to handle high concurrency, see rate limiter design and my services for API development.
Two Main Approaches (2026)
The core challenge is merging concurrent edits. Two families of solutions dominate in 2026.
Operational Transformation (OT) — The server maintains a single linear history of operations. When a client sends an op, the server transforms it against all pending or applied ops and then applies it. Every client receives the same transformed op sequence, so everyone converges to the same state. Google Docs and many document editors use OT. Advantages: strong consistency, single source of truth, and predictable behaviour. Disadvantages: the server is a bottleneck, transformation logic is complex (especially for rich text and nested structures), and offline support is harder because you need the server to transform.
CRDTs (Conflict-free Replicated Data Types) — Each operation is designed so that no matter in what order two replicas receive ops, they converge to the same state. There is no central arbiter; replicas can merge independently. Used in Figma, some code collaboration tools, and offline-first apps. Advantages: no single point of failure, good for P2P and multi-region, and natural offline support. Disadvantages: eventually consistent (there can be a brief moment where replicas differ), and the design of the CRDT (e.g. RGA for sequences, Yjs, Automerge) is critical for performance and correctness.
In 2026, OT is still common for “single source of truth” document apps where you control the client and server. CRDTs are common for offline-first, multi-region, or decentralised products. Choose based on whether you prioritise strong consistency and central control (OT) or availability and offline (CRDTs). For more on scaling stateful systems, see distributed cache design.
Architecture Building Blocks
Transport — WebSockets (or WebTransport in 2026) provide low-latency bidirectional messaging. Use them for ops and presence. Have a fallback to long polling or HTTP for environments where WebSockets are blocked. Compress payloads (e.g. MessagePack, or a custom binary format) to reduce bandwidth and latency.
Connection routing — All clients editing the same document must receive the same stream of ops. Options: sticky sessions (same connection always hits the same server, and that server fans out to clients on that doc), or a pub/sub layer (Redis Streams, Kafka, or a managed pub/sub). When a server receives an op, it publishes to the doc’s channel; all servers subscribed to that channel forward to their local clients. Pub/sub decouples “which server the user is on” from “which doc they are editing,” which simplifies scaling.
Presence — Use a separate channel or message type for “cursor moved,” “user joined,” “user left,” and optional “selection changed.” Throttle presence updates (e.g. at most 10 cursor updates per second per user) so they do not overwhelm the network. Send presence at lower priority than content ops so that typing is never delayed by cursor broadcasts.
Persistence — Store snapshots (full document state) periodically (e.g. every N ops or every M minutes) and an op log from the last snapshot. On load, fetch the latest snapshot and replay the log. Compress and archive old log segments. Consider incremental snapshots or differential encoding to reduce storage and replay cost. For more on storing and serving hot data at scale, see distributed cache design.
Auth and authorization — Validate the user’s token or session when they connect. Enforce per-document read/write permissions at the API layer (e.g. when they request the doc or send the first op). Do not trust the client for access control.
Scaling in 2026
Edge and regional servers — Run WebSocket servers close to users (edge or regional POPs). Ops and presence can be routed to a central coordinator or via pub/sub so that all regions see the same stream. Latency improves because the first hop is local; the rest of the path (pub/sub, persistence) can be central or replicated.
Doc sharding — Each document’s traffic is independent. Scale by adding more WebSocket servers and ensuring the pub/sub layer can handle the number of doc channels. No need to shard a single doc’s state across nodes for typical doc sizes; one logical “doc channel” per document is enough.
Rate limiting — Limit ops per connection and per document so that one buggy or malicious client cannot flood the system. See rate limiter design for patterns. Apply limits at the WebSocket server or at the pub/sub ingress.
Compression and encoding — Use binary or compact encoding for ops (MessagePack, Protocol Buffers, or a custom format). Reduces bandwidth and serialisation cost. For text, consider delta encoding (send only the diff) or run-length encoding for long unchanged spans.
Common Pitfalls
Ordering — In OT, the server must apply ops in a consistent order and transform incoming ops against that order. In CRDTs, ops must be commutative or ordered (e.g. by vector clocks or hybrid logical clocks). Getting ordering wrong leads to divergent state or lost updates.
Presence overload — Cursor updates can be very frequent. Throttle and batch them; do not send every mousemove. Use a separate, lower-priority channel so presence never blocks content.
Snapshot cost — Full snapshots of large documents can be expensive. Schedule them during low activity, or use incremental snapshots. Replay cost grows with log length; compact the log by moving the “base” snapshot forward.
Failover — If a WebSocket server dies, clients reconnect and may miss ops that were in flight. Use at-least-once delivery (e.g. pub/sub acknowledgments) and idempotent op application so that replaying an op is safe. Clients can request the latest state or ops since a sequence number on reconnect.
Summary
Design real-time collaboration around OT or CRDTs for consistency, WebSockets for transport, pub/sub for fan-out, and snapshots + op log for persistence. In 2026, edge deployment and efficient encoding make it easier to scale and keep latency low. Add presence on a separate channel with throttling, rate limiting to protect the system, and auth at the boundary. For more on building scalable backends and APIs, see rate limiter design, distributed cache design, and my services. Need help designing or building real-time or API systems? Get in touch.