12k
All articles

Real-Time Video Processing with the WebCodecs API

WebCodecs video processing with MediaStreamTrackProcessor, TransformStream, and VideoTrackGenerator, plus frame close, backpressure, workers, and browser support.

OpenReplay Team
OpenReplay Team
Real-Time Video Processing with the WebCodecs API

A WebCodecs video pipeline has three parts: a MediaStreamTrackProcessor that converts a MediaStreamTrack into a ReadableStream<VideoFrame>, a TransformStream where you manipulate each frame, and a VideoTrackGenerator that converts processed frames back into a MediaStreamTrack you can assign to a <video> element. VideoTrackGenerator is the current spec name; Chromium examples still use the older, non-standard MediaStreamTrackGenerator. Here is the whole pipeline:

const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const track = stream.getVideoTracks()[0];

const processor = new MediaStreamTrackProcessor({ track });
const generator = new VideoTrackGenerator();

const grayscale = new TransformStream({
  async transform(frame, controller) {
    try {
      const canvas = new OffscreenCanvas(frame.displayWidth, frame.displayHeight);
      const ctx = canvas.getContext('2d');
      ctx.filter = 'grayscale(1)';
      ctx.drawImage(frame, 0, 0);
      controller.enqueue(new VideoFrame(canvas, {
        timestamp: frame.timestamp,
        duration: frame.duration,
      }));
    } finally {
      frame.close();
    }
  },
});

processor.readable.pipeThrough(grayscale).pipeTo(generator.writable);
videoEl.srcObject = new MediaStream([generator.track]);

This article is about the parts every existing tutorial skips: the failure modes. The happy-path version above works until it doesn’t — until frames leak, the transform falls behind, the encoder enters a closed state, or you ship code that assumes Safari can’t run it. Each of those is a real production failure with a specific cause and a specific fix, and that’s what the rest of this covers.

Key Takeaways

  • A WebCodecs pipeline is MediaStreamTrackProcessorTransformStreamVideoTrackGenerator; use the spec constructor new MediaStreamTrackProcessor({ track }), not the deprecated positional form.
  • Forgetting frame.close() exhausts the finite media resources the pipeline depends on; once exhausted, frame emission stalls, producing video that stutters and then freezes while the rest of the page stays responsive.
  • MediaStreamTrackProcessor does not propagate backpressure upstream — when your transform falls behind, the processor silently drops the oldest frames rather than throwing an error.
  • Do all VideoFrame work in a single worker: a frame transferred across worker boundaries closes on the sending side automatically, and touching it again throws.
  • WebCodecs support splits by interface — core VideoEncoder/VideoFrame ship in Chrome 94+, Firefox 130+, and Safari 16.4+, while MediaStreamTrackProcessor/VideoTrackGenerator lag behind (Safari 18+, unsupported in Firefox).

What WebCodecs Is and Why the Pipeline Looks Like This

WebCodecs gives JavaScript direct handles to the browser’s built-in, often hardware-accelerated media codecs and to raw video frames. Before it, a MediaStream was opaque: you piped it into a <video> element and the browser handled everything between capture and display. WebCodecs breaks that pipeline open. The VideoFrame interface exposes the raw pixels between capture and encoding, which is exactly where a filter, a virtual background, or a custom encoder needs to sit.

The reason the pipeline uses Streams is that raw decoded frames are large (several megabytes each) and arrive fast (25+ per second), so you need flow control and incremental processing rather than buffering everything in memory. The WHATWG Streams API was designed for exactly this kind of atomic chunk processing through pipe chains. MediaStreamTrackProcessor bridges a live track into a stream; the TransformStream is where your per-frame work happens; VideoTrackGenerator bridges back to a track that the rest of the platform — <video>, RTCPeerConnection — understands.

WebCodecs operates only on non-containerized streams. If you need to read or write MP4/ISOBMFF, you supply your own container logic. Audio has a parallel surface (AudioData, AudioEncoder) that this article doesn’t cover; the patterns below are video-specific.

A Working Camera → Filter → Display Pipeline

A working WebCodecs filter pipeline captures with MediaStreamTrackProcessor, filters inside a TransformStream using Canvas2D directly on the VideoFrame, and displays through VideoTrackGenerator — the shape shown in the opening code block. The key efficiency move is ctx.drawImage(frame, 0, 0)drawImage accepts a VideoFrame as a source directly, so you can draw frames to a canvas without manually converting them to a PNG or creating an intermediate ImageBitmap.

For a Canvas2D color filter, the ctx.filter string is the cheapest path. For anything pixel-addressable — chroma key, custom convolution — use getImageData/putImageData:

const filter = new TransformStream({
  async transform(frame, controller) {
    try {
      const w = frame.displayWidth, h = frame.displayHeight;
      const canvas = new OffscreenCanvas(w, h);
      const ctx = canvas.getContext('2d', { willReadFrequently: true });
      ctx.drawImage(frame, 0, 0);

      const imageData = ctx.getImageData(0, 0, w, h);
      const px = imageData.data;
      for (let i = 0; i < px.length; i += 4) {
        const lum = 0.299 * px[i] + 0.587 * px[i + 1] + 0.114 * px[i + 2];
        px[i] = px[i + 1] = px[i + 2] = lum;
      }
      ctx.putImageData(imageData, 0, 0);

      controller.enqueue(new VideoFrame(canvas, {
        timestamp: frame.timestamp,
        duration: frame.duration,
      }));
    } finally {
      frame.close();
    }
  },
});

Two things carry from the original frame into the new one: timestamp and duration. The timestamp is the frame’s identity throughout the pipeline — it survives encode/decode cycles and is what you use to measure latency later. Drop it and downstream consumers lose frame ordering.

For heavier per-pixel work at full resolution, getImageData read-back is the bottleneck; WebGL or WebGPU (via importExternalTexture) keep the frame on the GPU and avoid the CPU read-back entirely. Use Canvas2D for color transforms and simple compositing; reach for a GPU path when per-pixel cost dominates your frame budget.

The VideoFrame Lifecycle: Why frame.close() Is Mandatory

Forgetting frame.close() doesn’t just leak ordinary memory — it exhausts the finite media resources the pipeline depends on, and once those are exhausted, decoding or frame emission stalls because no new frame can be allocated or emitted, producing the characteristic symptom of video that stutters progressively and then freezes while the rest of the page stays responsive. VideoFrame.close() releases the underlying media resource the frame holds, and the WebCodecs specification is explicit that these resources are finite — frames backed by hardware buffers come from a limited pool, and a source cannot emit a new frame when the pool is full.

This is why close() is not optional cleanup you can defer to garbage collection. The garbage collector doesn’t know about the underlying media resource on its own schedule, and by the time it runs, the pool is already exhausted. Every VideoFrame you read from the processor, and every one you construct, must be closed exactly once when you’re done with it.

The non-obvious failure is the error path. If your transform throws after reading a frame but before closing it, that frame leaks — and a transform that throws on one frame usually throws on the next, so the leak compounds quickly. The fix is try/finally:

async transform(frame, controller) {
  try {
    // ...filter work that might throw...
    controller.enqueue(newFrame);
  } finally {
    frame.close(); // runs whether or not the body threw
  }
}

finally guarantees frame.close() runs on both the success and the error path. This is the single most important pattern in a WebCodecs pipeline.

Backpressure: Why Slow Transforms Silently Drop Frames

MediaStreamTrackProcessor does not propagate backpressure upstream. When your TransformStream falls behind, the processor silently drops the oldest frames rather than slowing the camera, and you will never see an error — only missing frames. The practical consequence: a transform that runs at 50ms per frame on a 30fps source (33ms budget) won’t error or queue indefinitely. It will quietly run at roughly 20fps with the difference dropped. You can detect this by watching the readable side’s queue from inside the transform. The TransformStreamDefaultController.desiredSize reflects the readable side’s backpressure state — when it goes negative, the readable side is over its high-water mark and the consumer is behind:

const filter = new TransformStream({
  async transform(frame, controller) {
    try {
      if (controller.desiredSize !== null && controller.desiredSize < 0) {
        // Consumer is behind. Drop this frame intentionally
        // instead of falling further behind.
        return;
      }
      // ...filter work...
      controller.enqueue(newFrame);
    } finally {
      frame.close();
    }
  },
});

When you detect backpressure, you have two levers. Drop intentionally — skip the current frame, as above, so a deliberate cadence replaces silent random loss. Or reduce the input: request a lower resolution or framerate from getUserMedia via MediaTrackConstraints, or call track.applyConstraints() to step down at runtime. Lowering resolution cuts per-frame pixel work directly and is usually the most effective fix for a CPU-bound filter.

Workers: Why Do All VideoFrame Work in a Single Worker?

Do all VideoFrame work in a single worker. When a VideoFrame is transferred across worker boundaries via postMessage, the sending side’s reference is closed automatically, and any attempt to read or close it again throws — a silent data race that is nearly impossible to debug across worker message queues. Frames inside transferred streams are serialized, which clones them and requires explicit closing on both sides. Mix the two and you get the early-close failure:

controller.enqueue(frame);
frame.close(); // Too early — enqueue is async; the frame may still be in flight

Because controller.enqueue() is asynchronous with respect to the consuming worker, closing the sender’s reference too early causes serialization failures, while never closing it causes the leak-then-freeze described above. Keep the whole MediaStreamTrackProcessorTransformStreamVideoTrackGenerator chain inside one worker and you avoid the ownership problem entirely. (For getting encoded chunks off the device — WebTransport, data channels — see the webrtcHacks pipeline series; that’s a topic of its own.)

When you do hand a frame to a worker — to feed the pipeline, not to split it — transfer it explicitly and stop touching it on the sending side:

// Main thread
worker.postMessage({ frame }, { transfer: [frame] });
// `frame` is now neutered here. Do not read or close it on the main thread.

After a transfer, the receiving worker owns the frame and is responsible for closing it. The sending thread must treat its reference as gone.

Encoding for Transmission or Recording

A VideoEncoder compresses raw VideoFrame objects into EncodedVideoChunk objects, delivered through an output callback for recording or transmission. Configure it with a codec string, dimensions, bitrate, and framerate:

const chunks = [];
const encoder = new VideoEncoder({
  output: (chunk, metadata) => {
    // chunk.type is 'key' or 'delta'; chunk has timestamp, duration, byteLength
    chunks.push(chunk);
  },
  error: (e) => console.error('encoder error', e),
});

encoder.configure({
  codec: 'vp8',          // or e.g. 'avc1.42001f' for H.264 baseline
  width: 640,
  height: 480,
  bitrate: 1_000_000,
  framerate: 30,
});

The output callback gives you an EncodedVideoChunk plus optional metadata; the chunk carries its type ('key' or 'delta'), timestamp, duration, and the encoded bytes. For codec strings, see the WebCodecs codec registry and MDN’s codec guide rather than guessing at AVC profile strings.

Request a keyframe with encoder.encode(frame, { keyFrame: true }) (note the capital F) when you need an intra frame, such as at stream start, after a seek, or at a recovery point — encoding every frame as a keyframe defeats inter-frame compression entirely and will significantly increase your bitrate. The option spelling is documented in MDN’s Using the WebCodecs API guide.

Recovering from a Closed Encoder

When VideoEncoder’s error callback fires and the encoder transitions to the 'closed' state, it cannot be reused. VideoEncoder.reset() exists for non-terminal cases, but recovery from a closed encoder means constructing a new instance and calling configure() again with the same parameters. Check state before every encode() and rebuild on close:

function encodeFrame(frame, keyFrame = false) {
  if (encoder.state === 'closed') {
    encoder = makeEncoder();   // construct + configure a fresh VideoEncoder
  }
  if (encoder.state === 'configured') {
    encoder.encode(frame, { keyFrame });
  }
}

Guarding encode() with a state check and a rebuild path is what keeps a long-running session alive through a transient codec error.

Browser Support in 2026

WebCodecs support splits by interface, and treating it as a single version number is the mistake every stale tutorial makes. The core VideoEncoder/VideoFrame interfaces are widely available; the Insertable Streams pieces — MediaStreamTrackProcessor and VideoTrackGenerator — ship on a different, slower timeline.

InterfaceChrome / EdgeFirefoxSafari
VideoEncoder / VideoFrame (core WebCodecs)94+130+16.4+
MediaStreamTrackProcessor94+Not supported18+
VideoTrackGeneratorNot supportedNot supported18+
MediaStreamTrackGenerator (non-standard)94+Not supportedNot supported

Verified against MDN browser-compat data for VideoEncoder and MediaStreamTrackProcessor. The blanket “Safari doesn’t support WebCodecs” caveat in most tutorials is both outdated and imprecise: Safari has shipped core WebCodecs since 16.4, with expanded codec support (including HEVC) in Safari 17.4. What Safari and Firefox lack is the Insertable Streams capture/output layer — so the camera → filter → display pipeline above runs today in Chromium, and in Safari 18+ when implemented in a dedicated worker, but on Firefox you can encode and decode frames while sourcing them another way.

The practical takeaway: feature-detect per interface, not per browser. Check for window.MediaStreamTrackProcessor and window.VideoEncoder separately, and have a Canvas/requestVideoFrameCallback fallback for the capture layer where the Insertable Streams pieces are missing.

Debugging Checklist

The three failure modes in a WebCodecs pipeline — dropped frames, runaway memory, and latency spikes — each have a distinct symptom and a direct diagnostic step.

SymptomLikely causeDiagnostic step
Progressive stutter then freeze, rest of page responsiveframe.close() missing on some path → finite media resources exhaustedAudit every VideoFrame read or constructed for exactly one close(); confirm try/finally coverage
Frames missing, no errors in consoleSlow transform; processor dropping oldest frames silentlyLog controller.desiredSize inside the transform; if it trends negative, the consumer is behind
Latency climbs over timeSlow per-frame filter eating the frame budgetMeasure per-step duration; compare against your framerate’s budget (33ms at 30fps)
Encoder stops producing chunksVideoEncoder entered 'closed' state after an errorCheck encoder.state before each encode(); rebuild on 'closed'

The stutter-then-freeze signature is worth recognizing on sight. Session replays of WebCodecs-based features reliably surface this pattern: smooth video that plays normally, then begins dropping frames visibly, then freezes entirely while the rest of the UI stays interactive. That is the visible signature of finite media resource exhaustion from unclosed frames — the replay shows the symptom clearly, but the cause is invisible without knowing to look for an unclosed frame somewhere in the pipeline code.

To measure true end-to-end latency — from camera capture to display — encode the frame’s timestamp as a pixel overlay before the pipeline and decode it from the rendered output via requestVideoFrameCallback. As a calibration point, the webrtcHacks pipeline benchmark (March 2023) reported these per-frame costs:

StepDuration
Background removal22ms
Overlay addition1ms
Encoding8ms
Decoding1ms
Display38ms

Your numbers will vary by hardware and filter complexity. The notable result is that display alone accounts for ~38ms — the dominant term — which means a filter that fits comfortably within a 30fps budget can still feel laggy if you don’t account for the display tail. Measure the whole path, not just your transform.

Conclusion

The WebCodecs pipeline shape — MediaStreamTrackProcessorTransformStreamVideoTrackGenerator — is small enough to fit in one code block, but the gap between a demo and a shippable feature is entirely in the failure modes: closing every frame, detecting silent backpressure, keeping the whole chain in one worker, recovering from a closed encoder, and feature-detecting per interface rather than per browser. Start from the try/finally example at the top of this article, add the desiredSize check and the encoder state guard, and you have a pipeline that survives the cases the happy-path tutorials never reach.

FAQs

When should I use Canvas2D versus WebGL or WebGPU for a WebCodecs filter?

Use Canvas2D for color transforms and simple compositing, where ctx.filter strings or modest getImageData loops fit the frame budget. Reach for WebGL or WebGPU when per-pixel cost dominates, because they keep the frame on the GPU via importExternalTexture and avoid the CPU read-back that getImageData forces. At full resolution, that read-back is usually the bottleneck, so a GPU path is the fix for heavy per-pixel work like chroma keying.

Why do my VideoFrames close unexpectedly when I pass them between workers?

Transferring a VideoFrame across a worker boundary via postMessage automatically closes the sending side's reference, so any attempt to read or close it again on the sender throws. This differs from frames inside transferred streams, which are serialized and cloned and require explicit closing on both sides. To avoid the data race, keep the whole pipeline in one worker, or after a transfer treat the sender's reference as gone and let the receiving worker own and close the frame.

Does the camera-to-filter-to-display pipeline work in Firefox?

Not fully. Firefox 130 and later support the core VideoEncoder and VideoFrame interfaces, but it does not support the Insertable Streams capture and output layer, meaning MediaStreamTrackProcessor and VideoTrackGenerator are unavailable. You can encode and decode frames in Firefox, but you must source frames another way, such as a Canvas with requestVideoFrameCallback. Feature-detect per interface by checking window.MediaStreamTrackProcessor and window.VideoEncoder separately rather than testing for the browser.

What's the difference between VideoEncoder.reset() and rebuilding the encoder?

VideoEncoder.reset() handles non-terminal cases, clearing pending work on an encoder that is still usable. It cannot recover an encoder that has transitioned to the closed state after an error fires, because a closed encoder cannot be reconfigured or reused. Recovery from closed means constructing a new VideoEncoder instance and calling configure() again with the same parameters. Check encoder.state before every encode() and rebuild when it reads closed.

DevTools for the frontend

Gain Debugging Superpowers

Unleash the power of session replay to reproduce bugs, track slowdowns and uncover frustrations in your app. Get complete visibility into your frontend with OpenReplay — the most advanced open-source session replay tool for developers.

Star on GitHub12k

We use cookies to improve your experience. By using our site, you accept cookies.