All articles
AIDeveloper ToolsEngineeringPerformanceReact

Measure Once: Rebuilding Chat Virtualization for AI-Era Apps

Our chat scrolled like it was drunk. Then we realized LLM apps break the assumptions every virtualization library was built on. Here is what we changed.

Pranit SharmaPranit Sharma
15 min read
AI
Measure Once: Rebuilding Chat Virtualization for AI-Era Apps

Our chat scrolled like it was drunk. Four rows up, two rows back, jitter every 200ms, blank gaps where messages should be, code blocks flashing into existence at the wrong height. You know the feeling: the page fights you every time your finger leaves the trackpad. We tried raising overscan. We tried disabling scroll anchoring. We tried three different virtualization libraries. None of them fixed it because none of them knew what we know now: AI chat breaks the assumptions every virtualized list was built on.

This is the story of what broke, why, and the architecture we landed on — a measure-once cache that spans two IndexedDB databases, an 80ms MutationObserver debounce, a 500ms per-row settle window, and a two-mode measureElement override that returns cached values on ref-attach but reads the DOM on ResizeObserver fires. It currently ships in the Orbit chat UI.

Why virtualization is hard for AI chat specifically

Virtualized lists are a solved problem for spreadsheets, feeds, and logs. None of those have all three of the following properties. AI chat has all three.

1. Expensive renders. Each assistant message is markdown through Streamdown — Vercel's drop-in react-markdown replacement built for streaming — plus syntax highlighting through Shiki's WASM runtime. A single FlowTokenSegment costs roughly 1.5ms to render from scratch, and a typical reply has 3–8 segments. Scroll one screen of viewport and you're easily rendering 20+ segments, or 30ms of main-thread work inside an 8.3ms frame budget at 120Hz (well past the INP threshold Google uses for Core Web Vitals). You cannot render these every time an item enters the viewport.

2. Async rendering. Shiki's grammar loader is asynchronous. The first call to code.highlight() returns null and fires a callback later. Streamdown re-renders the code block when the callback fires. So a single message row passes through three distinct size states: wrapper mounted (near-zero height), markdown rendered (intermediate), and highlighted code finalized (final). Every state produces a ResizeObserver fire. A virtualization library that trusts the first measurement has already lost.

3. Immutable-after-completion, persisted-across-sessions. A completed chat message is never rewritten. Its height at a given viewport width is deterministic and immortal — 500px today, 500px two days from now. And users reopen sessions, sometimes months later. The measurement you took today is still valid tomorrow if you bother to persist it.

A data-table row fails (1) and (2) and (3) — columns resize, sorts change, nothing is cached worth caching. An infinite feed of images fails (2) and (3) — images lazy-load, cards reflow, you're lucky to keep things in memory. AI chat is the rare list where you can spend real bookkeeping effort because the payoff is permanent.

What TanStack Virtual already does and where it stops

Credit where it's due. TanStack Virtual (and its React bindings, @tanstack/react-virtual) ships with an in-memory measurementsCache, a ResizeObserver per item, scroll compensation via shouldAdjustScrollPositionOnItemSizeChange, and a sane default measureElement. For a standard list this is enough.

It does not know:

  • That your app has persistent sessions, so measurements should live in IndexedDB, not just memory.
  • That rendered HTML is more cacheable than the React component that produced it.
  • That your first measurement of a row is probably a pre-Shiki placeholder, not the final height.
  • That your custom RAF-based velocity scroll writes scrollTop directly and bypasses instance.isScrolling.
  • That "the same message at the same viewport width" is a domain-level primary key, not a DOM artifact.

None of these are bugs in the library. They are application-specific decisions. We shipped all of them on top.

The core insight: measure once, use forever

Once a completed chat message has rendered at a given viewport width, its height is deterministic and immortal. This is the single sentence all of the following code is implementing.

It sounds obvious. It is not how any of the libraries are structured, because most lists can't assume this. Tables can't. Feeds can't. A data grid with collapsible rows can't. AI chat can — and if you build around that assumption, every downstream pain point (scroll jitter, blank rows, scroll pushback) has a concrete place to be fixed rather than a gradient of "maybe it'll stop if we raise overscan."

We split the measurement into two distinct artifacts that live in two different IndexedDB databases:

  • Row height cache (orbit-render-cache) — the full pixel height of each message row, keyed by ${sessionId}:${messageId}. Used by estimateSize and measureElement.
  • Rendered HTML cache (orbit-streamdown-cache) — the innerHTML of each FlowTokenSegment after Shiki settles, keyed by djb2(segmentText). Used on remount to skip the whole render pipeline.

The databases are independent. The first fixes positioning (TanStack Virtual). The second fixes render cost (Streamdown + Shiki). You need both. A cached height with no HTML means you still pay the 1.5ms render. Cached HTML with no height means TanStack still estimates wrong and scroll pushes back. Together they give a remount that costs ~0.05ms.

The write-through cache pattern

The streamdown cache is populated by a live render, not a background worker. This is counterintuitive — isn't it faster to render ahead of time? We tried that. It's brittle. A background container measures at a different width than the real container if the user has the right panel open. Sidebar toggles. Tool widgets affect flow. We'd have to mirror every layout decision the live UI makes.

So instead: we render once live, then capture our own output. FlowTokenSegment uses a MutationObserver inside a useLayoutEffect to watch its own DOM for mutations, waits 80ms after mutations stop, then reads innerHTML and height off itself:

// apps/agent/src/components/chat/messages/MessageItem.tsx
const FlowTokenSegment: FC<{
  readonly text: string;
  readonly onClick: (e: React.MouseEvent<HTMLDivElement>) => void;
  readonly isStreaming: boolean;
}> = memo(function FlowTokenSegment({ text, onClick, isStreaming }) {
  const wrapperRef = useRef<HTMLDivElement>(null);
  const contentHash = useMemo(() => (isStreaming ? 0 : hashContent(text)), [text, isStreaming]);
  const viewportWidth = getChatContentWidth();
  const cached =
    !isStreaming && contentHash !== 0 ? getStreamdownCache(contentHash, viewportWidth) : null;
 
  useLayoutEffect(() => {
    if (isStreaming || cached !== null) return;
    const el = wrapperRef.current;
    if (el === null) return;
 
    let settleTimer: ReturnType<typeof setTimeout>;
    let disposed = false;
 
    const capture = (): void => {
      if (disposed) return;
      observer.disconnect();
      const html = el.innerHTML;
      const height = el.getBoundingClientRect().height;
      if (html.length > 0 && height > 0) {
        setStreamdownCache(contentHash, {
          html,
          height,
          viewportWidth: getChatContentWidth(),
          cachedAt: Date.now(),
        });
      }
    };
 
    const observer = new MutationObserver(() => {
      clearTimeout(settleTimer);
      settleTimer = setTimeout(capture, SHIKI_SETTLE_MS); // 80ms
    });
 
    observer.observe(el, {
      childList: true,
      subtree: true,
      attributes: true,
      characterData: true,
    });
 
    settleTimer = setTimeout(capture, SHIKI_SETTLE_MS);
    return (): void => {
      disposed = true;
      observer.disconnect();
      clearTimeout(settleTimer);
    };
  }, [contentHash, isStreaming, cached]);
 
  if (cached !== null) {
    return (
      <div
        ref={wrapperRef}
        className="chat-markdown prose prose-sm dark:prose-invert max-w-none select-text"
        onClick={onClick}
        dangerouslySetInnerHTML={{ __html: cached.html }}
      />
    );
  }
 
  // ... live Streamdown render path
});

Three details matter here.

Why djb2 hash over message ID. Assistant messages with tool interleaving are split into multiple FlowTokenSegments by buildUnifiedSegments(). One message becomes segments A, B, C — text, tool call, text. Using message ID as the cache key would require knowing which segment slot we're in. Using djb2Dan Bernstein's one-pass string hash with decent distribution — means each segment is self-identifying. Identical text in different messages deduplicates naturally. No segment-slot bookkeeping:

// apps/agent/src/lib/chat/streamdown-cache.ts
export function hashContent(text: string): number {
  let hash = 5381;
  for (let i = 0; i < text.length; i++) {
    hash = ((hash << 5) + hash + text.charCodeAt(i)) | 0;
  }
  return hash >>> 0; // unsigned 32-bit
}

Why dangerouslySetInnerHTML on HIT. On a cache hit we bypass React's reconciler for this subtree entirely. The parent <div> is still React-managed; its children are an opaque HTML blob. The saving is real: a live Streamdown segment is ~1.5ms; a HIT render with dangerouslySetInnerHTML is ~0.05ms. Thirty times faster. At a typical frame containing 20 segments during scroll, that is the difference between 30ms of render (five dropped frames at 120Hz) and 1ms (one-eighth of a frame budget).

Why wait 80ms after mutations stop. Shiki highlights code blocks via a WASM callback. If you capture innerHTML the moment the wrapper mounts, you store the pre-highlighted code block and every future HIT will render without highlighting. The MutationObserver watches for DOM changes; the 80ms quiet window is long enough to absorb Shiki's grammar-load-and-replace, short enough to not feel sluggish. Our tests simulate the full Shiki lifecycle — placeholder fires, then markdown fires, then final fires, then settle.

The cross-session measurement cache

Row heights live separately, keyed by ${sessionId}:${messageId}, in the orbit-render-cache IndexedDB. On app boot warmMemoryCacheFromIdb() pulls every non-expired cache entry into memory. The store caps at 50 sessions (MAX_CACHED_SESSIONS) with a 30-day TTL (MAX_CACHE_AGE_MS). Sessions beyond the cap are pruned LRU.

This is where we fell into our first real bug. Call it the pre-Shiki trap.

The first version of this cache stored every measurement TanStack's default measureElement handed us with a flag measured: true. On revisit, we loaded all measured: true entries into cachedSizeMapRef. It worked, sometimes. Other times the first row of a restored session would render blank, or two rows would overlap.

The cause: our first ResizeObserver fire for a row happens when the wrapper mounts — before Shiki has run. That fire records, say, size: 210. Then Streamdown finishes parsing markdown: another fire, size: 380. Then Shiki callback: size: 470. The row is 470 pixels. But if we snapshotted the cache between fires 1 and 3, we'd persist 210 as if it were final. On revisit, TanStack would position the row at 210, the DOM would render at 470, and everything below would jump — or a previously-positioned row on top would eat into this one's space, producing an overlap.

The fix is two-sided. We track which rows are truly settled, and we only persist settled rows as measured: true:

// apps/agent/src/components/chat/chat-messages.tsx
const ROW_SETTLE_MS = 500;
 
const scheduleSettleCheck = useCallback((): void => {
  if (settleCheckTimerRef.current !== null) {
    clearTimeout(settleCheckTimerRef.current);
  }
  settleCheckTimerRef.current = setTimeout(() => {
    const now = Date.now();
    let anyNewlySettled = false;
 
    for (const [key, lastMeasureAt] of lastMeasureAtRef.current) {
      if (settledKeysRef.current.has(key)) continue;
      if (now - lastMeasureAt < ROW_SETTLE_MS) continue;
      settledKeysRef.current.add(key);
      anyNewlySettled = true;
    }
 
    if (anyNewlySettled) {
      snapshotRef.current?.();
    }
  }, ROW_SETTLE_MS);
}, []);

Per row, we record lastMeasureAt on every ResizeObserver fire. If a row has had no new fire for 500ms, it's settled. Only then does it enter settledKeysRef. The snapshot persists each measurement with measured: settledKeysRef.has(key). On revisit, a filter runs:

// apps/agent/src/components/chat/chat-messages.tsx — revisit restore
if (cache) {
  const trustedSizeMap = new Map<string, number>();
  for (const m of cache.measurements) {
    if (m.measured === true && m.size > 0) {
      trustedSizeMap.set(m.key, m.size);
      settledKeysRef.current.add(m.key);
    }
  }
  cachedSizeMapRef.current = trustedSizeMap;
}

Rows that never settled (fast navigation away, window close during Shiki) come back as estimates on the next visit and re-measure live. Rows that did settle come back with exact pixels. Either way the cache never lies.

This schema change was binary-incompatible with old entries. Rather than writing an IndexedDB migration that downgrades old measured: true to measured: false based on hope, we bumped DB_VERSION to 3 and added a one-time sentinel in main.tsx:

// apps/agent/src/main.tsx — one-time cache reset
const CACHE_RESET_FLAG = 'orbit-cache-reset-v3';
const cacheResetPromise =
  localStorage.getItem(CACHE_RESET_FLAG) === '1'
    ? Promise.resolve()
    : Promise.all([
        new Promise<void>((resolve) => {
          const req = indexedDB.deleteDatabase('orbit-render-cache');
          req.onsuccess = (): void => resolve();
          req.onerror = (): void => resolve();
          req.onblocked = (): void => resolve();
        }),
        new Promise<void>((resolve) => {
          const req = indexedDB.deleteDatabase('orbit-streamdown-cache');
          req.onsuccess = (): void => resolve();
          req.onerror = (): void => resolve();
          req.onblocked = (): void => resolve();
        }),
      ]).then(() => {
        localStorage.setItem(CACHE_RESET_FLAG, '1');
      });

Bump the sentinel name to force another reset when the schema changes.

The measureElement split

TanStack Virtual calls measureElement(el, entry, instance) in two distinct situations: when the row's ref attaches (entry === undefined), and when the ResizeObserver fires (entry !== undefined). Early versions of our code returned the cached size in both cases:

// DON'T: returns cached size on every call — breaks ResizeObserver feedback.
measureElement: (element, entry, instance) => {
  const key = element.getAttribute('data-item-key');
  const cached = cachedSizeMapRef.current.get(key ?? '');
  if (cached && cached > 0) return cached;
  return measureElement(element, entry, instance);
};

This "worked" until it didn't. When the viewport width changed, or when the cached size was wrong, every ResizeObserver fire returned the stale cached value, meaning the size never updated. The cache was immortal even when the DOM disagreed. Classic lying-to-yourself bug.

The fix is to treat the two situations differently:

// apps/agent/src/components/chat/chat-messages.tsx
measureElement: (element, entry, instance) => {
  const itemKey = element.getAttribute('data-item-key');
 
  // ── Ref-attach path ──────────────────────────────────────────
  // entry === undefined: TanStack is asking for the initial size.
  // We lie and return the cached size so the row positions at its
  // final height from frame one. The DOM might not match yet
  // (that's fine, ResizeObserver will update it below).
  if (entry === undefined) {
    if (itemKey !== null) {
      const cached = cachedSizeMapRef.current.get(itemKey);
      if (cached !== undefined && cached > 0) {
        return cached;
      }
    }
    const initialDomSize = measureElement(element, entry, instance);
    if (itemKey !== null && initialDomSize > 0) {
      cachedSizeMapRef.current.set(itemKey, initialDomSize);
    }
    return initialDomSize;
  }
 
  // ── ResizeObserver path ──────────────────────────────────────
  // entry !== undefined: real layout change. Always read the DOM
  // so we stay honest, update the cache, and un-settle the row.
  const domSize = measureElement(element, entry, instance);
  if (itemKey !== null && domSize > 0) {
    cachedSizeMapRef.current.set(itemKey, domSize);
    lastMeasureAtRef.current.set(itemKey, Date.now());
    settledKeysRef.current.delete(itemKey);
    scheduleSettleCheckRef.current();
  }
  return domSize;
},

The mental model: entry === undefined is a lie that works. entry !== undefined is a truth that has to be told.

When the ref attaches and we return the cached size, TanStack places the row at the correct final height before the DOM has even rendered content. The FlowTokenSegment then injects its cached HTML via dangerouslySetInnerHTML. The DOM settles at the exact same height the cache promised. The subsequent ResizeObserver fire reads back that same height. Delta = 0. shouldAdjustScrollPositionOnItemSizeChange never fires. No cascade.

Contrast with returning the cached size on the ResizeObserver path too: if the real DOM has any different size (say, during a viewport width change), we'd silently suppress the update. The cache would drift from reality. The second mode has to be honest.

The settle-gated snapshot

We mentioned the 500ms per-row quiet window. A natural question: why per-row? Why not a session-wide timer? The original implementation was session-wide: "300ms after isAgentRunning transitions false, snapshot." This was simpler and wrong.

The reason: Shiki does not finish in order. On an assistant reply with four code blocks, the shortest one might finish in 40ms and the longest in 900ms. At the 300ms mark some rows are post-Shiki and some are still placeholders. If we snapshot at 300ms, we poison the cache with the in-progress rows' pre-Shiki heights. On revisit, three messages are correct and one is 200px shorter than it should be.

The per-row timer solves this:

  • ResizeObserver fires for row A → lastMeasureAtRef.set(A, now), delete A from settledKeysRef, schedule check.
  • 400ms later, ResizeObserver fires for row A again (Shiki callback) → reset timer.
  • 500ms later with no A fires → A joins settledKeysRef.
  • Meanwhile row B fires its own timeline independently.

The snapshot fires any time new rows become settled, not on any wall-clock schedule. Entries are persisted individually as measured: true or measured: false. This is the condition that lets the revisit filter be correct.

The velocity-scroll guard

Final piece. TanStack Virtual has instance.isScrolling, which it uses to gate scroll compensation. If the user is scrolling, above-viewport size deltas don't trigger a scrollTop adjustment — useful, because otherwise the page fights the user every time a new measurement comes in.

But instance.isScrolling is tied to the native scroll event. Our velocity scroll hook uses requestAnimationFrame to apply deceleration to trackpad flick gestures, writing scrollTop directly. Each RAF write fires a scroll event, but TanStack's internal heuristic — which toggles isScrolling based on debounced timing — sometimes lags the reality. Result: during a velocity animation, TanStack believes it is not scrolling, fires compensation on an above-viewport size delta, and shifts scrollTop backward against the user's gesture. We called this the 4-up-2-back pushback.

The fix is not to disable compensation. Compensation is correct during streaming, when new content is growing above the viewport and we want the user's visible content anchored. We just need to scope it away from active scroll gestures. A plain scroll listener writes lastScrollAtRef on every scroll event (native or velocity-driven), and the guard checks it:

// apps/agent/src/components/chat/chat-messages.tsx
useEffect(() => {
  rowVirtualizer.shouldAdjustScrollPositionOnItemSizeChange = (item, _delta, instance) => {
    if (instance.isScrolling) {
      return false;
    }
    // Velocity scroll bypasses TanStack's isScrolling by writing scrollTop
    // directly. Treat any scroll event within the last 250ms as "scrolling"
    // to suppress compensation during velocity-driven animations — prevents
    // the "4-up-2-back" pushback from size deltas on above-viewport rows.
    if (Date.now() - lastScrollAtRef.current < 250) {
      return false;
    }
    const viewportHeight = instance.scrollRect?.height ?? 0;
    const scrollOffset = instance.scrollOffset ?? 0;
    // ... rest of the compensation decision
  };
  return (): void => {
    rowVirtualizer.shouldAdjustScrollPositionOnItemSizeChange = undefined;
  };
}, [rowVirtualizer]);

250ms is long enough to cover a typical velocity decay, short enough that streaming compensation kicks back in as soon as the user stops. This is not about disabling scroll compensation. It is about scoping it to the right moments.

Results

We verified the pipeline with 29 integration tests spanning five layers: cache module, write-through capture, measureElement override, stream-end trigger, and settle-gated snapshot. Every piece above has a test that would catch its regression. Real numbers we can quote from the code:

  • Cache HIT render cost: ~0.05ms (dangerouslySetInnerHTML) versus ~1.5ms live Streamdown — 30× faster per segment.
  • Frame budget target: 8.3ms at 120Hz (Apple ProMotion and 120Hz external monitors).
  • Settle windows: 80ms Shiki debounce (MutationObserver), 500ms per-row settle, 250ms scroll guard.
  • Viewport width tolerance: 16px — absorbs scrollbar toggle and DPI rounding, catches real resizes.
  • Capacity: 2000 entries in the streamdown memory LRU, 50 sessions in the render cache, 30-day TTL on both.

Numbers we do not have precise instrumentation for, marked approximate:

  • First-visit full-session render time — approximate, depends on message count and code-block density.
  • Revisit cold-switch time — approximate, bounded by IndexedDB getAll() plus React reconcile.
  • Steady-state cache hit rate — approximate, we observe near-100% HIT for completed messages after first settle.
  • IndexedDB footprint per 1,000 messages — approximate, dominated by HTML strings (tens of MB range).

Qualitatively: the jitter is gone. The pushback is gone. Scroll feels like scroll. We have not dropped below 120Hz during scroll on the development hardware since the velocity guard landed.

When this pattern generalizes

The three-property lock (expensive, async, immutable-after-completion) is not unique to Orbit. It shows up in:

  • LLM chat UIs — any app that streams markdown with code blocks.
  • LLM playgrounds — rendered responses with inline diffs and diagrams.
  • Code explanation / review tools — heavy Shiki, Mermaid, or similar async rendering.
  • Agent UIs with tool widgets — mixed content, tool outputs frozen at completion.
  • Notebook-style apps — cells that render once and stay put.

Where it does not apply:

  • Editable lists — if the user can reorder, reword, or toggle disclosure, heights change.
  • Tables with sorting or filtering — rows reflow.
  • Infinite feeds — lazy-loaded images, interstitials, ads change geometry.
  • Comment threads with live updates — replies arrive, votes change heights.

The test is simple. Ask: if I measure this row today, will its pixel height at this viewport width be the same in two weeks? If yes, this architecture fits. If no, you need something different.

What we are still working on

The current system is tuned for sessions of 60–200 messages, which is our observed distribution. At 10,000 messages per session we would need a different persistence story: IndexedDB getAll() becomes a meaningful hit, and keeping 10k HTML blobs in the memory LRU exceeds the memory budget. Paged loading of the persisted cache is the likely direction. Not yet a problem.

Images and some embedded tool widgets still have edge cases. An image that has not loaded measures shorter than its final height; we currently let the ResizeObserver catch the image load and un-settle the row. It works but adds a beat of visible adjustment on first paint. We are experimenting with reserved aspect-ratio boxes for known image dimensions.

Multi-monitor DPI transitions (dragging the window between a Retina and a non-Retina display) briefly invalidate cached heights because the content width changes by fractional pixels. The 16px tolerance covers most cases; a rare edge case remains.

Further reading (external)

Closing

Virtualization libraries are built for a world where items are cheap, synchronous, and transient. AI chat breaks all three. We found that accepting the break — and doing real bookkeeping on top — buys scroll that finally feels correct. The code is shipping today in Orbit, the 29 integration tests that guard it are on-tree in the chat package, and if you want to poke at the real thing you can grab it from the download page.

Measure once. Use forever. Never re-render what hasn't changed.