Ranveer KumarEngineering Essays
Frontend Architecture19 min read

Designing Real-Time Frontend Systems: WebSockets, Events, Sync, and Streaming UI

Design real-time frontend systems with WebSockets, SSE, event ordering, idempotency, retries, optimistic UI, and resilient streaming UX.

Updated May 23, 2026

A WebSocket connection is not a real-time architecture.

That sentence matters because many frontend designs treat the transport as the system. Once the browser can open a socket and receive messages, the architecture is considered done. Then production arrives: the network drops, the tab sleeps, events arrive twice, events arrive out of order, auth expires, the user performs an optimistic action, the server rejects it, and the UI has to keep making sense.

Real-time frontend system design is the discipline of preserving correctness and user trust while time, network conditions, server state, and local intent are all changing.

This is part 2 of the . Part 1 covered the shift from component thinking to system thinking. Here the lens gets concrete: WebSockets, Server-Sent Events, polling, event ordering, idempotency, retries, optimistic UI, and streaming user experience.

Real-time UI is not about making data arrive quickly. It is about making change understandable, recoverable, and correct enough for the product domain.

Why This Matters for Senior Frontend Roles

Real-time features expose whether a frontend engineer thinks in components or systems. A mid-level implementation might wire new WebSocket(url) inside a hook and update state when messages arrive. A senior implementation asks harder questions.

What is the event contract? Are events ordered? Are they idempotent? Can the client recover missed messages? What happens when the user has multiple tabs open? How does authentication refresh? Which data can be optimistic? How does the UI reveal degraded mode without causing panic? How do we observe message lag, reconnect storms, duplicate events, and stale renders?

These questions are not overengineering. They are the real system. A chat application, trading dashboard, incident console, collaborative editor, logistics tracker, or AI streaming interface can all look simple in a demo and fail badly in real networks.

Senior frontend engineers are expected to design the client side of that contract. They do not need to own every backend detail, but they must understand enough to shape the contract and defend the user experience.

Problem Framing and Constraints

Before choosing WebSockets, SSE, or polling, define the product behavior.

Ask:

  • Is communication one-way or bidirectional?
  • Does the user need every event, or only the latest state?
  • Can events be replayed after reconnect?
  • Is ordering global, per entity, or irrelevant?
  • What is the acceptable staleness window?
  • Does the UI need optimistic updates?
  • What should happen when the connection is degraded but cached data exists?
  • How are authorization, subscription scope, and tenant boundaries enforced?
  • What telemetry proves the stream is healthy?

The correct transport depends on these answers. Polling can be perfectly acceptable for low-frequency updates. SSE can be a better fit than WebSockets for one-way server updates because it is simpler and works naturally with HTTP semantics. WebSockets are valuable when the client and server need a long-lived bidirectional channel, but they demand more lifecycle discipline.

Architecture Mental Model

Do not let UI components talk directly to a raw socket. That creates tight coupling between transport events and rendering concerns. Instead, put explicit layers between the network and the UI.

The connection manager owns transport lifecycle: connecting, authenticating, heartbeat, reconnect, close, and degraded mode.

The subscription layer maps product surfaces to server topics. A dashboard may subscribe to account summary, incident feed, and notification count separately. Each subscription should have ownership, authorization scope, and cleanup.

The event processor validates messages, deduplicates them, orders them when the contract requires it, and routes them to the correct state boundary.

The state store or server cache owns merge semantics. The UI should read stable state and render user feedback, not parse raw events.

Real-time frontend architectureA flow from client to connection manager, subscription layer, event processor, state store, and UI.ClientintentConnectionmanagerSubscriptionlayerEventprocessorStateUIRaw messages are converted into validated, recoverable product state before rendering
Real-Time Frontend ArchitectureA resilient real-time UI separates transport lifecycle, subscriptions, event processing, state, and rendering.

Choosing the Transport

Transport choice should follow communication semantics.

Use polling when updates are infrequent, exact immediacy is not required, and operational simplicity matters. Polling is easy to cache, easy to debug, and resilient through ordinary HTTP infrastructure. It can become wasteful at high frequency or high scale.

Use SSE when the server pushes one-way updates and the client does not need to send frequent messages on the same channel. SSE has automatic reconnection behavior, fits HTTP deployments better than many teams expect, and works well for status feeds, notifications, progress streams, and AI token streaming.

Use WebSockets when the product needs bidirectional, low-latency communication: collaborative editing, multiplayer interaction, presence, command acknowledgment, or high-frequency dashboards. WebSockets offer flexibility, but the client must own connection state, heartbeat, auth renewal, backoff, subscription replay, and stale event handling.

The senior move is to explain the trade-off, not to treat WebSockets as the mature default.

Event Contracts

A real-time event should be self-describing enough for the client to validate, route, deduplicate, and merge it.

export type EventEnvelope<TPayload> = {
  id: string;
  type: string;
  topic: string;
  version: number;
  sequence?: number;
  entityId?: string;
  occurredAt: string;
  replayToken?: string;
  payload: TPayload;
};

export type Subscription = {
  topic: string;
  scope: {
    tenantId: string;
    userId: string;
    role: "viewer" | "operator" | "admin";
  };
  replayFrom?: string;
  onEvent: (event: EventEnvelope<unknown>) => void;
  onDegraded: (reason: string) => void;
};

The id supports deduplication. The topic supports routing. The version allows schema migration. The optional sequence supports ordering when the backend can provide it. The replayToken gives the client a recovery point after reconnect. Without these fields, the client is forced to guess.

Lifecycle State Machine

The connection should be modeled as a state machine, even if you implement it with a reducer or a small service object. Loose booleans like isConnected, isLoading, hasError, and shouldReconnect drift quickly.

WebSocket lifecycle state machineA state machine with idle, connecting, authenticated, subscribed, degraded, reconnecting, and closed states.idleconnectingauthenticatedsubscribeddegradedreconnectingclosedFailures do not jump straight to blank UI. They move through degraded and reconnecting states first.
WebSocket Lifecycle State MachineConnection lifecycle should account for auth, subscriptions, degraded mode, reconnect, and terminal close.

Here is a compact connection manager shape. It is pseudocode, but the boundaries are production-oriented.

type ConnectionState =
  | { status: "idle" }
  | { status: "connecting"; attempt: number }
  | { status: "authenticated"; socket: WebSocket }
  | { status: "subscribed"; socket: WebSocket; replayToken?: string }
  | { status: "degraded"; reason: string; replayToken?: string }
  | { status: "reconnecting"; attempt: number; replayToken?: string }
  | { status: "closed"; reason: string };

export class RealtimeConnection {
  private state: ConnectionState = { status: "idle" };
  private subscriptions = new Map<string, Subscription>();

  constructor(
    private readonly createUrl: () => Promise<string>,
    private readonly scheduleRetry: (attempt: number) => number
  ) {}

  async connect() {
    const attempt =
      this.state.status === "reconnecting" ? this.state.attempt + 1 : 1;

    this.state = { status: "connecting", attempt };
    const socket = new WebSocket(await this.createUrl());

    socket.onopen = () => this.authenticate(socket);
    socket.onmessage = (message) => this.handleMessage(message.data);
    socket.onclose = () => this.reconnect("socket closed");
    socket.onerror = () => this.reconnect("socket error");
  }

  subscribe(subscription: Subscription) {
    this.subscriptions.set(subscription.topic, subscription);
    this.send({ type: "subscribe", topic: subscription.topic, replayFrom: subscription.replayFrom });
  }

  private authenticate(socket: WebSocket) {
    this.state = { status: "authenticated", socket };
    this.send({ type: "authenticate" });
    for (const subscription of this.subscriptions.values()) {
      this.subscribe(subscription);
    }
    this.state = { status: "subscribed", socket };
  }

  private reconnect(reason: string) {
    const replayToken =
      "replayToken" in this.state ? this.state.replayToken : undefined;
    const attempt =
      this.state.status === "reconnecting" ? this.state.attempt + 1 : 1;

    this.state = { status: "degraded", reason, replayToken };
    window.setTimeout(() => {
      this.state = { status: "reconnecting", attempt, replayToken };
      void this.connect();
    }, this.scheduleRetry(attempt));
  }

  private send(message: unknown) {
    if ("socket" in this.state) {
      this.state.socket.send(JSON.stringify(message));
    }
  }

  private handleMessage(raw: string) {
    // Parse, validate, deduplicate, merge, and update replay token here.
  }
}

Event Reconciliation

Receiving an event is not the same as applying it. A client needs a reconciliation pipeline.

Event reconciliation sequenceA sequence from receive to validate, deduplicate, order, merge, and render.receivevalidatededuplicateordermergerender
Event Reconciliation SequenceEvents should be validated, deduplicated, ordered, merged, and rendered through stable state boundaries.

Deduplication is the simplest place to prevent many production bugs.

export function createEventDeduper(maxEntries = 1000) {
  const seen = new Map<string, number>();

  return {
    shouldApply(event: EventEnvelope<unknown>, now = Date.now()) {
      if (seen.has(event.id)) {
        return false;
      }

      seen.set(event.id, now);

      if (seen.size > maxEntries) {
        const oldest = [...seen.entries()].sort((a, b) => a[1] - b[1])[0];
        if (oldest) {
          seen.delete(oldest[0]);
        }
      }

      return true;
    }
  };
}

For ordered streams, dedupe is not enough. You need to buffer or reject messages based on sequence. The right answer depends on the domain. A stock ticker may prefer latest value. A payment ledger must not skip events silently. A collaborative editor needs a stronger protocol than a simple sequence number.

Retry, Backoff, Replay, and Degraded UI

Retries can become an outage multiplier. If every tab reconnects instantly after a gateway restart, the frontend participates in the incident. Use backoff with jitter and expose degraded state to the UI.

export function exponentialBackoffWithJitter(
  attempt: number,
  options = { baseMs: 500, maxMs: 30_000, jitterRatio: 0.35 }
) {
  const exponential = Math.min(
    options.maxMs,
    options.baseMs * 2 ** Math.max(0, attempt - 1)
  );
  const jitter = exponential * options.jitterRatio * Math.random();

  return Math.round(exponential - jitter);
}

The UI should not pretend everything is fine during reconnect. It should show the last updated time, a degraded indicator, and whether user actions are queued, disabled, or still safe. This is especially important for operational systems where stale data can lead to wrong decisions.

Failure path with heartbeat, backoff, replay, and degraded UIA failure path from heartbeat miss to degraded UI, backoff, reconnect, replay missed events, and live UI.heartbeatmisseddegradedUIbackoffwith jitterreplaymissedliveThe user sees stale-but-labeled state while the client rebuilds continuity.
Failure Path With ReplayHeartbeat failure moves the UI to degraded mode, reconnects with backoff, replays missed events, and returns to live state.

Optimistic UI and Conflict Handling

Optimistic UI is useful when the user action is likely to succeed and the domain can tolerate temporary divergence. It is dangerous when actions are irreversible, regulated, security-sensitive, or dependent on complex server validation.

For real-time systems, optimistic updates must be reconciled with server events. A local pending action should have a client correlation ID. When the server confirms, reject, or transforms the action, the UI needs to merge that outcome without duplicating the item or hiding the failure.

type PendingMutation = {
  clientMutationId: string;
  entityId: string;
  optimisticPatch: Record<string, unknown>;
  createdAt: number;
};

export function reconcileServerEvent(
  pending: PendingMutation[],
  event: EventEnvelope<{ clientMutationId?: string; entityId: string }>
) {
  const confirmedMutationId = event.payload.clientMutationId;

  return {
    remainingPending: pending.filter(
      (mutation) => mutation.clientMutationId !== confirmedMutationId
    ),
    shouldRenderAsConfirmation: Boolean(confirmedMutationId),
    affectedEntityId: event.payload.entityId
  };
}

Trade-Offs and Decision Matrix

DecisionOption AOption BSenior trade-off
TransportWebSocketSSE or pollingWebSockets support bidirectional low-latency interaction but require lifecycle ownership. SSE and polling are simpler for one-way or low-frequency updates.
OrderingStrict sequenceLatest state winsStrict ordering protects ledgers and workflows but needs buffering and replay. Latest-wins is simpler for presence, counters, and dashboards.
ReconnectImmediate retryBackoff with jitterImmediate retry feels fast in small tests but can amplify incidents. Backoff protects the system and should be paired with visible degraded UI.
State mergeApply events directlyNormalize and reconcileDirect updates are quick but fragile. Reconciliation costs more upfront and reduces duplicate, stale, and out-of-order bugs.
Optimistic UIImmediate local updateWait for server confirmationOptimism improves perceived speed but needs rollback and conflict handling. Confirmation is safer for sensitive workflows.

Failure Modes and Recovery Design

Real-time systems fail in patterns:

  • Duplicate events are applied twice and inflate counters.
  • Events arrive out of order and overwrite newer state with older state.
  • The client reconnects without replay and silently misses changes.
  • A background tab wakes up with stale auth and floods logs with rejected subscriptions.
  • Optimistic UI shows success but the server rejects the mutation.
  • Multiple tabs each open their own high-frequency connection and multiply load.
  • A degraded stream keeps rendering old data without telling the user.
  • Telemetry records socket errors but not affected topic, tenant, route, release, or replay token.

Recovery starts with product-specific classification. For a notification badge, latest state may be enough. For an incident console, missed events require replay and a visible freshness indicator. For a trading or payment surface, the UI may need to block actions until continuity is restored.

Performance, Accessibility, Security, and Observability

Performance risk in real-time UI usually comes from render frequency. Batch events. Normalize state. Avoid re-rendering the entire page for a small topic update. Use stable selectors and virtualize high-volume lists when needed.

Accessibility matters because live regions can easily become noisy. Do not announce every event. Announce user-relevant state changes, errors, and recovery. Preserve focus when new content arrives. Avoid moving interactive targets while keyboard users are navigating.

Security depends on scoped subscriptions and backend enforcement. Never trust client-side topic names or roles. Avoid putting sensitive replay tokens in URLs or logs. Treat connection URLs and auth refresh paths as sensitive.

Observability should measure connection state, reconnect attempts, heartbeat lag, message lag, event validation failures, duplicate events, replay success, dropped events, degraded duration, and user actions attempted while degraded.

How to Explain This in a Senior Frontend System Design Interview

A strong interview answer starts like this:

I would not start with a socket hook. I would first define the communication semantics: one-way or bidirectional, event ordering, replay needs, freshness budget, optimistic behavior, and failure recovery.

Then propose layers:

  1. Transport choice based on semantics.
  2. Connection manager for lifecycle, auth, heartbeat, reconnect, and close.
  3. Subscription layer for topic ownership and replay.
  4. Event processor for validation, dedupe, ordering, and routing.
  5. State boundary for merge semantics.
  6. UI layer for freshness, degraded mode, optimistic feedback, and accessibility.
  7. Telemetry for production correctness.

If the interviewer adds "messages can arrive out of order," you can add buffering or latest-wins semantics depending on the domain. If they add "users can have multiple tabs," you can discuss shared workers, broadcast channels, or single-tab ownership. If they add "server cannot replay," you can explain the risk and design a refetch-on-reconnect fallback.

That adaptability is what senior frontend system design is testing.

Production-Readiness Checklist

  • Transport choice is justified by communication semantics.
  • Event envelope includes ID, type, topic, version, time, and replay or sequence data when needed.
  • Connection lifecycle includes idle, connecting, authenticated, subscribed, degraded, reconnecting, and closed.
  • Reconnect uses exponential backoff with jitter.
  • Subscriptions are scoped by tenant, user, role, and route ownership.
  • Events are validated before being applied.
  • Duplicate and out-of-order behavior is defined.
  • Replay or refetch-on-reconnect is available.
  • Optimistic mutations have correlation IDs and rollback behavior.
  • UI shows freshness and degraded state.
  • Accessibility announcements are useful, not noisy.
  • Telemetry captures lag, reconnects, validation failures, replay success, and degraded duration.

Read the Full Series

Closing

A mature real-time frontend is not a socket attached to a component. It is a layered system that protects user trust while messages, networks, tabs, caches, and server state move independently.

Design the contract first. Then choose the transport. Then make failure visible enough that users can keep making good decisions.

Related Articles

Continue the thread