Broadcasting

Embed live captions in a broadcast

Bringing translated captions into a broadcast graphics pipeline using Loquira's translation output.

Last updated · May 17, 2026 4 min read

Broadcasters want captions — clean, translated, and arriving in the same graphics pipeline that handles lower-thirds and tickers. (For background on the distinction, see Live captions vs live translation.) This guide covers the latency characteristics of Loquira’s translation pipeline and how to work with its output today.

Latency budget

Broadcasters operate within strict timing constraints. Every frame has a scheduled position. Understanding Loquira’s end-to-end latency helps plan for caption placement in the broadcast stack.

Stage	Latency
Speech recognition (Deepgram Nova-3)	~300 ms
Translation (Gemini)	~250 ms
Text-to-speech synthesis	~200 ms
Total end-to-end	~750 ms

This is well within the delay buffer most broadcasters maintain for live events (typically 3–10 seconds for legal review and profanity delay). A live-to-air route is realistic for news coverage, press conferences, and live event broadcasting.

Working with translated output today

Loquira’s listener view displays translated text and audio in real time. For broadcast integration, two approaches are currently available:

Post-session caption overlay. After the session ends, export the transcript in SRT or WebVTT format (see Transcripts and exports for all available formats). Import the file into your editing or playout system to burn captions into the recorded broadcast. This is the most reliable method and works with any graphics pipeline.

Listener view as reference. Open the Loquira audience view on a dedicated device and position it off-screen. A caption operator watches the translated text and enters captions manually into the graphics system. This introduces human latency but gives full editorial control over timing and visibility.

Direct caption feed integration into broadcast graphics systems (OBS, vMix, CasparCG) is on the product roadmap.

Failure modes to plan for

Network loss on the presenter device. Translation stops immediately. Have a fallback graphic (“Live translation temporarily unavailable”) ready.
Audio dropout at the source. The recognizer will not produce captions for silence. Brief the on-air talent to fill rather than wait.
Language switching mid-broadcast. Possible — but introduces a 1–2 second gap as the translation pipeline re-warms. Switch only between segments.