Transcription for Subtitles: Synchronisation, Line Length, and Readability Standards

March 24, 2026 · 6 min read ·

Subtitles are not simply a transcript of spoken words. They are visual elements with their own standards: line length, display speed, synchronisation with speech. Automatic transcription will save you hours, but without knowing these rules, the final subtitles still will not work well.

Transcription and Subtitles — Two Different Products

Subtitles exist in three basic types. Open subtitles (burned-in, hardcoded) are part of the image — they cannot be turned off and are shown to all viewers. Typical for social media, teasers, and content shared without settings options. Closed captions (files in SRT or VTT format) are optional — the viewer can turn them on or off. Standard for online video (YouTube, Netflix), educational platforms, and accessibility purposes for deaf and hard-of-hearing viewers. Simultaneous captions (live captioning) are a special case for live broadcasts and conferences requiring real-time delivery.

Why transcription does not automatically produce subtitles: a transcript captures words in time, but does not account for line length, reading speed, or visual layout. One paragraph of transcript typically corresponds to 3–8 subtitle blocks with different time codes. A subtitle block must not persist on screen longer than the corresponding speech — nor disappear too quickly.

Readability Standards — International Guidelines

Subtitle standards are not arbitrary. They are based on research into reading speed and visual cognition. Knowing them is a prerequisite for subtitles that viewers can actually read in time.

Line Length

BBC Subtitle Guidelines set a maximum of 42 characters per line for English text. The Netflix Timed Text Style Guide works with the same figure. The same principle applies for other languages — but Czech words are on average longer than English, so fewer words fit on a single line.

Examples: "customer support" = 16 characters, "customer support" in Czech = 20 characters. "Meeting room" = 12 characters, its Czech equivalent = 17 characters. A sentence in many languages demands more space on a subtitle line — a subtitler must account for more blocks or more aggressive shortening.

Standard: a maximum of 2 lines of 42 characters each.

Reading Speed

17–20 words per minute is the standard subtitle reading speed for an average adult viewer. Netflix defines 17 words/min for content intended for a general audience. This corresponds roughly to 5–7 seconds duration for a two-line block.

If the speaker talks faster than the subtitle speed allows, subtitles must shorten and paraphrase — not transcribe verbatim. This is a fundamental difference from transcription: a subtitler actively edits for readability. Diaz Cintas and Remael (2007) consider this ability a core subtitler competency.

Minimum and Maximum Display Time

Minimum: 1.0–1.5 seconds. A shorter block cannot be read even unconsciously. Maximum: 7 seconds for one block — longer loses synchronisation with speech. Gap between blocks: 2 frames (83 ms at 24fps; 67 ms at 30fps). Without a gap the subtitle block "flickers".

Synchronisation — How SRT and VTT Files Work

SRT Format

SRT (SubRip Subtitle) is the most widely used subtitle format. It is accepted by all platforms and players. Each block has a number, start and end timestamp, and text:


1
00:00:12,500 --> 00:00:15,200
Good morning, welcome to the lecture.

2
00:00:15,600 --> 00:00:18,900
Today we will look at the basics of speech transcription.

The 400 ms gap between the end of the first and start of the second block meets the standard.

VTT Format

WebVTT is the standard for web players and HTML5. Its structure is similar to SRT, but additionally allows metadata, subtitle positioning on screen, and styles. For accessibility subtitles on web platforms, VTT is the preferred format.

For a detailed look at export formats and their specific uses, see A22.

Where Automatic Synchronisation Fails

Fast speech causes blocks to "pile up" — text length exceeds the available display time. Speaker overlaps disrupt synchronisation: the algorithm synchronises one voice, the other is lost. Pauses in speech without pauses in the image (musical transitions, B-roll shots) cause a subtitle block to persist on screen longer than it should — and synchronisation drifts from the action. Word-level timestamps have ±200 ms accuracy — critical transitions must be fine-tuned manually.

Workflow for Creating Subtitles from Automatic Transcription

Step 1 — Automatic Transcription with Timestamps

Choose a transcription service with word-level timestamp support (Google STT, Deepgram, AssemblyAI, Whisper large). Segment-level timestamps (the whole segment without breakpoints) are insufficient for precise subtitling. Czech Transcription System exports timestamps as part of JSON output, from which SRT or VTT can be generated.

Step 2 — Export to SRT or VTT

Direct export: most transcription tools offer direct export to SRT or VTT. Via JSON: a more flexible approach allowing custom formatting and processing. [See A22 — export formats]

Step 3 — Editing in a Subtitle Editor

Subtitle Edit (free, Windows / Wine on Linux) is the most comprehensive freeware subtitle editor. It offers automatic checking of line lengths, reading speed (words per minute), synchronisation and standards compliance. It also includes a Waveform view for visually aligning subtitles with the audio track. https://nikse.dk/subtitleedit

Aegisub (free, multiplatform) is the standard for fansubs and advanced subtitling with finer typography control.

CapCut, Descript: integrated tools for video creators — simpler, fewer precision controls, but adequate for basic projects.

Always fix: lines longer than 42 characters (trim or rephrase), blocks shorter than 1.0 s, blocks without a gap after the preceding one.

Step 4 — Review in Video Context

Play the video with subtitles all the way through, or at least at randomly selected points. What looks correct in a text editor may not work on screen: slow text, subtitle blocking an important part of the image, a block persisting after the corresponding utterance ends.

Accessibility Subtitles

Closed captions for deaf viewers also describe audio events that deaf users cannot perceive: [laughter], [music], [thunder], [phone ringing]. Relevant legislation and standards (WCAG 2.1 Success Criterion 1.2.2) require captions for pre-recorded audio content on websites. Broadcasters are subject to their own national accessibility requirements for linear broadcast.

Conclusion

Automatic transcription does not produce subtitles — it gives you raw material. Readability standards and correct synchronisation are what turn a transcript into real subtitles. Learning them takes an hour; it pays off for every video.

Practical minimum: export the transcript to SRT, open it in Subtitle Edit, check line lengths and reading speed (the "Check timing" or "Check characters per second" function), play the video from the start. Three steps that will save even automatically generated subtitles.

For an overview of export formats, see A22; for creating subtitles for educational content and accessibility, see A32.

Sources

BBC Subtitle Guidelines. https://bbc.co.uk/commissioning/tv/production/articles/titles-credits-and-subtitles
Netflix Timed Text Style Guide. https://partnerhelp.netflixstudios.com/hc/en-us/articles/217350977
Diaz Cintas, J. & Remael, A. (2007). Audiovisual Translation: Subtitling. Routledge. [ISBN 978-1-900650-98-4]
W3C WCAG 2.1 — Success Criterion 1.2.2: Captions (Prerecorded). https://w3.org/WAI/WCAG21/Understanding/captions-prerecorded
Subtitle Edit — documentation. https://nikse.dk/subtitleedit