Operations

Curate transcripts after the event

Cleaning, attributing, and distributing multilingual transcripts so the post-event document holds up under scrutiny.

Last updated · May 16, 2026 7 min read

A raw Loquira transcript captures every word the speech engine recognised: complete sentences, sentence fragments, false starts, repeated filler words, and cross-talk artefacts. It is an accurate record of what the engine heard. It is not a publishable document. For the available download formats and export options, see Transcripts and exports.

This guide covers the curation pass — the minimum cleanup that turns a raw transcript into a document suitable for distribution, quotation, and archival.

The minimum cleanup pass

A curated transcript should pass three tests:

A reader can identify who said what.
The text flows as written prose, not as disfluent speech.
No sensitive material appears in the distributed version.

Step 1: Speaker attribution. The raw transcript records utterances as a single stream. If multiple speakers were active, add a speaker label at the first utterance of each speaker and whenever the speaker changes. Use the speaker’s name or role: “Alina Novak (CEO):” or “Moderator:”. For press conferences, identify journalists by outlet if permission has been granted: “Question — Le Monde:”.

Step 2: Paragraph breaks and structure. The raw transcript arrives as a block of timed segments. Insert paragraph breaks at natural topic transitions. If the evening’s agenda covered three topics, the transcript should have three sections. Add header annotations in square brackets for topic shifts: “[Transition to Q&A]”.

Step 3: Clean up disfluencies. Remove repeated filler words (um, uh, you know, like, sort of). The speech engine faithfully reproduces every utterance including these. A curated version serves the reader better by omitting them. Do not correct grammar, rephrase sentences, or alter the speaker’s meaning. The transcript is a record, not a rewrite.

Reconciling translated transcripts against the original

When a session had multiple output languages active, each language transcript is an independent rendering of the original speech. A direct back-translation of the French transcript into English will not match the English original word-for-word — translation introduces legitimate variation in phrasing, idiom handling, and sentence structure.

How to reconcile for distribution:

Distribute the original-language transcript as the authoritative version.
Distribute each translated transcript alongside it, labelled clearly: “French translation (machine-generated)”. For a worked example of post-event distribution, see the transcript section in Host a town hall.
Do not attempt to manually harmonise the translations with the original. The variation is inherent to the translation process and does not indicate errors.

If a specific passage must be identical across all language versions — a policy statement, a legal disclaimer, a key quote — verify the translation of that passage separately and annotate the transcript if needed. This is rare for most use cases but essential for regulatory or compliance contexts.

Redaction for sensitive material

Before distributing a transcript externally, review it for sensitive content that should not appear in the published version. For retention policies and how Loquira handles session data, see Privacy and data handling.

What to look for:

Personally identifiable information (phone numbers, email addresses, home addresses) spoken during the event. The speech engine captures these accurately.
Off-the-record remarks made during on-the-record segments. A speaker may transition from on-the-record to off-the-record mid-sentence.
Commercially sensitive forward-looking statements that were cleared for the room but not for external distribution.

Redaction method: Replace the sensitive passage with a bracketed description: “[Redacted — commercially sensitive]” or “[Personal information removed]”. Do not use the raw transcript as the redaction proof; the text is in the same position. Create a separate redacted file.

Archiving conventions for long-term records

Organisations that run weekly or monthly Loquira sessions accumulate a transcript archive. Without naming conventions, the archive becomes unusable within a few quarters.

Recommended archive structure:

/transcripts/
  YYYY/
    YYYY-MM-DD_event-name/
      YYYY-MM-DD_event-name_en.txt
      YYYY-MM-DD_event-name_fr.txt
      YYYY-MM-DD_event-name_ja.txt
      YYYY-MM-DD_event-name_metadata.json

The metadata JSON file stores session-level information: speaker name, event type, duration, number of listeners per language, and any curator notes (e.g. “Q&A segment missing — microphone was off during Q&A”).

Retention decisions per event:

Not every transcript needs to be kept indefinitely. Establish a retention category for each event type:

Event type	Retention	Example
Board meetings	Permanent	Annual shareholder meeting
Internal all-hands	2 years	Quarterly town hall
Press conferences	1 year	Product launch
Weekly stand-ups	90 days	Engineering sync
Test sessions	30 days	Dry run before an event

Apply retention at the archive level, not per-file. A script that checks folder creation dates against the retention policy can automate cleanup.