Transcription for Journalism: From Interview Recording to a Quotable Sentence

March 24, 2026 · 5 min read ·

A journalist records an interview and needs a quotable sentence within two hours. Manual transcription of an hour of recording takes three hours. Automatic transcription produces a draft in ten minutes. What comes next — and where the line between editing and distortion lies — is a journalistic question, not a technical one.

What a Journalist Needs from a Transcript

Journalistic transcription has different priorities from academic or archival work. Four basic requirements:

Speaker identification — who said what is the basis of direct speech. Attributing a statement to the wrong respondent is a factual error, not a stylistic one. In an interview with two or more speakers, attribution of each turn is a prerequisite for a usable result.

Quotable sentences — comprehensible, grammatically acceptable, accurately capturing the meaning of the statement. Verbatim spontaneous speech may be technically faithful but hard to read: "Well, I mean, you know, how should I put it, it just doesn't quite work, it's simply not possible." The journalist needs: "It simply doesn't work." With an awareness of what can and cannot be changed.

Timestamps — time markers for locating a specific point in the recording. In the event of a dispute about the content of a quotation, going to the marked time and verifying is sufficient.

Speed — a journalist works under deadline. Three hours to transcribe an hour of recording is a luxury that newsrooms rarely have.

Verbatim or Editorial Transcription?

The choice depends on the genre and context, not personal preference.

Verbatim transcription preserves everything: slips of the tongue, self-corrections by the speaker, filler words, incomplete sentences, hesitation. It is indispensable when the precise wording of a statement has legal or factual implications — court citations, disputed claims in investigative stories, statements the respondent may challenge. Verbatim transcription is evidentiary material: what was said, not what the speaker intended to convey.

Editorial transcription is the standard for interviews, features, and commentary. It removes filler words, grammatically tidies sentences while preserving meaning and the style of the statement. Such a transcript serves the final quotation — the reader, not the recording.

Automatic transcription delivers a verbatim basis. The journalist decides what to make of it.

What Automation Saves

Time is the most direct benefit.

Manual transcription of one hour of recording: an experienced transcriber needs three to six hours, depending on speech rate, recording quality, and topic complexity.

Automatic transcription + editing: draft in 10 minutes. Editing — checking the most important quotations, proper names, transitions, assigning names to speakers — 30 to 60 minutes per hour of recording. Total: one hour of work instead of four to five.

Diarisation as a basis for attribution: a transcript with automatic speaker separation (Speaker 1 / Speaker 2) gives the journalist a starting structure. Assigning names is manual work taking minutes — the journalist knows who was in the interview. The result is a transcript with clear attribution of each turn as working material for editing.

Timestamps for verification: click a marked time → the player jumps to exactly the point in the recording. Verifying a disputed quotation takes seconds, not minutes of rewinding.

Journalistic Ethics of Transcription: What You May and May Not Do

Direct speech has unwritten rules in media practice that derive from professional ethics standards.

Acceptable edits:

Removing filler words — "um", "you know", "well", "right" — is standard practice. This does not change the meaning; it significantly improves the readability of the quotation.

Correcting obvious slips that the speaker themselves corrected: "We arrived on Thursday — on Friday." The speaker self-corrected immediately; transcribing "on Friday" is faithful to the intent of the statement.

Grammatical tidying while preserving meaning and style: "Well, I, actually, because I was saying, right, that it isn't so" → "I was saying it isn't so." Meaning preserved; quotation readable.

Lines that must not be crossed:

Changing meaning — adding or removing negation, condition, or qualification. "We do not support this solution" and "we are willing to consider this solution" are different statements, even if the speaker used both formulations during the interview.

Taking out of context — a sentence that in the context of the full answer means the opposite or something different. Quoting it in isolation is manipulation, even if it is verbatim correct.

Synthesising statements — composing one quotation from two different sentences from different points in the interview as a single uninterrupted direct speech. Acceptable only with explicit ellipsis marking: "... [quotation]."

Verifying Quotations

Best practice before publication: send the most important quotations to the respondent for correction. This protects both the journalist and the editorial team from dispute. Standard especially in investigative journalism where statements are sensitive.

Practical Workflow: From Recording to Publication

Record the interview → automatic transcript with diarisation enabled
Quick check of the transcript: the most important quotations, names of respondents and institutions, numbers and dates
Assign names to speakers (Speaker 1 = journalist, Speaker 2 = respondent)
Mark quotations planned for use in the text
Edit marked quotations within the bounds of acceptable editorial changes
Archive the recording and transcript as source material

Export for journalistic work: TXT for editing in a text editor or CMS. JSON with timestamps for locating a specific point in the recording during verification. Timestamped TXT as a compromise — readable, with time markers.

Czech Transcription System transcribes an interview with diarisation — the result divides statements by speaker and includes timestamps. Export in TXT or JSON format enables direct work in a text editor. Timestamps make it easy to verify a specific quotation in the recording without replaying the entire interview.

How to handle more complex transcription conditions — multiple speakers or poor recording — is explained in the diarisation overview A04 and audio preparation guide A12. What an editor may do with a transcript is elaborated in the article on automatic transcription and language editing A21.

Sources:

SPJ Code of Ethics — Society of Professional Journalists [spj.org/ethicscode.asp]