Transcription Export Formats: TXT, SRT, VTT, JSON, CSV — What to Choose When
Transcription is done. Which format to download? TXT is the simplest, but you lose timestamps and speaker information. JSON carries everything, but requires processing. SRT is for subtitles but not for archiving. The choice of format determines what you can do with the transcript next — and what you lose forever.
Why Format Matters More Than It Seems
Formats are not just different wrappers for the same content. Each carries different data, and information missing from the file cannot be reconstructed.
The transcription model produces more than just text. Each word has a timestamp (when it was spoken), a confidence score (how certain the model is), and a speaker identifier (who said it). This information exists in the machine output — but not every export format preserves it.
Example: you download TXT without timestamps. Three months later you need to add subtitles to the video — you need SRT with timestamps. Timestamps are not in the TXT → they cannot be added retrospectively. You must either run the transcription again or download SRT from the start.
Golden rule: download in the richest format your environment can handle. Simplifying is always possible. Adding missing data is not.
Five Formats and Their Uses
TXT — Plain Text
TXT is text and nothing else. No timestamps, no confidence score, no structured speaker identification — if any, then as a plain text separator. The file opens in any text editor without special software.
Suitable for: reading the transcript, manual editing in a text editor or CMS, inserting into a document, show notes, simple readable archive.
Not suitable for: subtitles (timestamps missing), programmatic integration (structure missing), archiving with metadata for later processing.
SRT — Standard for Video Subtitles
SubRip Subtitle is the most widely used subtitle format. Every platform and player accepts it. Each segment has a sequence number, start and end timestamps, and text. The format is simple, text-based, with no additional metadata.
1
00:00:01,000 --> 00:00:04,200
Good afternoon, welcome to today's lecture.
2
00:00:04,800 --> 00:00:06,100
Thank you for the invitation.
Suitable for: video subtitles on YouTube, VLC, DaVinci Resolve, Premiere Pro, orientational transcript timeline with synchronisation.
Not suitable for: archiving (transcript metadata, confidence score, speaker identification missing), text analysis (sequence numbers and timestamps complicate reading).
VTT — Web Standard
WebVTT (Web Video Text Tracks) is the format defined by W3C for web players. Its structure is similar to SRT, but additionally supports text formatting (bold, italic), comments, speaker identification in some implementations, and segment metadata.
WEBVTT
00:00:01.000 --> 00:00:04.200
Good afternoon, welcome to today's lecture.
Suitable for: web video playback (HTML5 <track> element), web accessibility (WCAG 2.1 standard), YouTube alternative to SRT, custom web players.
Advantages over SRT: native browser support without plugins, richer formatting options, metadata.
JSON — Complete Data
JSON carries everything the transcription model produces: words, word-level timestamps, confidence score, speaker identifiers, transcript metadata (language, model, recording length, processing parameters).
{
"words": [
{"text": "Good", "start": 1.0, "end": 1.2, "confidence": 0.98, "speaker": "SPEAKER_1"},
{"text": "afternoon", "start": 1.2, "end": 1.6, "confidence": 0.99, "speaker": "SPEAKER_1"}
]
}
Suitable for: programmatic integration into applications, custom analytics tools, archiving with complete data for later processing, conversion to any other format, import into QDA tools (MAXQDA, ATLAS.ti) via custom script.
Not suitable for: manual reading or editing without a specialist tool (JSON is readable but impractical for direct work).
CSV — For Spreadsheet Processing
CSV is a tabular format: rows correspond to transcript segments, columns carry individual attributes (start time, end time, text, speaker, confidence). The file opens in Excel or Google Sheets.
Suitable for: tabular analysis across transcripts, statistical processing (confidence distribution, segment length, speaker share), reports, database import.
Not suitable for: subtitles, direct reading of transcript text, archiving with hierarchical metadata.
Decision Overview
| I need to... | Format | Why |
|---|---|---|
| ------------- | -------- | ----- |
| Read the transcript | TXT | Simplest, opens everywhere |
| Video subtitles | SRT | Most widely supported standard |
| Web subtitles | VTT | HTML5 native standard |
| Integrate into application | JSON | Complete structured data |
| Analyse in a spreadsheet | CSV | Tabular format |
| Archive completely | JSON | Everything in one place |
| Create other formats from one source | JSON | Source format for conversions |
What Can and Cannot Be Converted
From a richer format to a simpler one is always possible. In the other direction data is missing — it cannot be reconstructed.
Lossless conversion from JSON: JSON → TXT (you lose structure, keep text), JSON → SRT (timestamps are in JSON), JSON → VTT (same as SRT), JSON → CSV (tabular extract from JSON data).
Lossy conversion (caution): TXT → JSON (timestamps missing → cannot be added). SRT → JSON (confidence and metadata missing → cannot be added). CSV → JSON (depends on columns — if timestamps are present, conversion is possible; if not, it cannot be done).
Practical consequence: JSON is the safest archival format. TXT, SRT, or CSV can always be generated from JSON. JSON cannot be reconstructed from TXT or SRT.
Czech Transcription System exports to TXT, JSON, SRT, CSV, and VTT. Archiving recommendation: JSON as the primary format, preserving complete results including per-word confidence, timestamps, and speaker information. For subtitles export SRT or VTT. For editing in a text editor export TXT.
How export formats serve subtitle creation is described in the transcription and subtitles overview A11. The web interface and API offer different options for configuring the export format A26.
Sources:
- WebVTT W3C specification [w3.org/TR/webvtt1/]
- RFC 8259 — JSON format [tools.ietf.org/html/rfc8259]
- WCAG 2.1, Success Criterion 1.2.2 Captions [w3.org/TR/WCAG21/]
- SubRip SRT specification — original documentation