What to Prepare Before Ordering Transcription — A Checklist
Transcription ordered without preparation most often ends in one of three ways: the transcriber comes back with questions that cause delays; the result is technically correct but does not meet requirements; or a sensitive recording ends up somewhere it should not be.
Fifteen minutes of preparation prevents these situations. Below is a checklist for four areas that determine the outcome — applicable both for commissioning a human transcriber and for using an automatic tool.
Block 1 — Recording Preparation
Verify Audibility
Before sending, play the recording at three points: beginning, middle, and end. What to look for: is the speaker audible without excessive amplification? Is background noise approximately constant, or are there sudden disruptive sounds? Do two people talk simultaneously for more than brief passages?
If quality is notably low, notify the transcriber in advance. The transcription will proceed, but error rates will be higher and review will take longer.
File Format
The most universal formats are MP3 and WAV — all major transcription tools and agencies accept them. Stereo files work for transcription the same as mono but are 2x larger. Video files (MP4, MOV) are processed directly by most tools; verify with agencies or freelancers in advance.
Bitrate above 128 kbps for MP3 does not improve transcription accuracy — it unnecessarily increases file size and upload time.
Naming and Organisation
Consistent file naming saves hours of navigating transcripts: 2024-10-15_interview-smith.mp3 is significantly better than IMG_20241015_154233.m4a. Send multiple files as a ZIP or via shared storage — not as dozens of email attachments. If recordings follow a sequence, number them explicitly (01_, 02_, ...).
Block 2 — Specifying Requirements
Transcription Type
This is the most commonly overlooked specification and the one that causes the greatest mismatch between the order and the result:
- Verbatim: every syllable exactly as spoken — filler words, false starts, incomplete sentences. Used for court protocols, phonetic research, or language corpora.
- Edited: remove filler words and excessive repetitions, adjust punctuation for readability; preserve factual content. Suitable for the vast majority of business and research transcriptions.
Default assumptions differ among transcribers and tools — this choice must be specified.
Speakers
Provide the approximate number of speakers. If names should appear in the transcript, supply a list: Speaker 1 = John Smith, Speaker 2 = Jane Doe. If speakers are not identified, agree on codes (Speaker A, B) or on transcription without attribution.
Language and Foreign-Language Passages
Are there foreign-language passages in the recording? Should they be transcribed in the original language, translated, or marked as [in English] without transcription? Are technical terms in English preserved in the original or localised?
Output Format
| Need | Format |
|---|---|
| ------ | -------- |
| Plain text for further processing | TXT |
| Formatted document | DOCX (Word) |
| Subtitles for video | SRT or VTT |
| Machine processing or database | JSON with timestamps |
| Spreadsheet overview | CSV |
Also specify: are timestamps needed? At what granularity — for each word, each sentence, or each speaker segment? How should paragraphs be divided?
Glossary
Supply a list of proper names, institution names, products, and technical terms. A plain text file with a word list is sufficient. The impact is direct: it reduces error rates for less frequent words and eliminates questions.
Block 3 — Data Protection
What the Recording Contains
A voice recording is personal data under GDPR. If the recording contains personal data, health information, trade secrets, or other sensitive information, the transcription method must correspond to the content sensitivity.
Cloud Tool — What to Verify
Three questions before sending a sensitive recording to a cloud tool:
- Where are the servers? EU servers mean processing under GDPR. US servers require verification of the legal basis for data transfer.
- How long does the tool retain files? Can they be deleted on request?
- Are recordings used for model training? For commercial transcription tools, the answer is usually no — but verify this in the terms of service, do not assume.
Choosing a Method Based on Sensitivity
- Low sensitivity (public lectures, non-confidential recordings): cloud transcription without special measures
- Medium sensitivity (corporate meetings, research interviews): EU servers, verify retention conditions
- High sensitivity (healthcare records, legal files, confidential business information): local transcription without sending files, or a human transcriber with an NDA
Block 4 — Reviewing the Delivered Transcript
Formal Compliance with the Order
Before diving into content review, check basic compliance: Does transcript length correspond to recording length? (Rough estimate: 1 minute of audio → approximately 100–150 words.) Is the output format correct? Are timestamps present if they were requested?
Six Targeted Review Checkpoints
Systematic review focusing on locations with the highest error probability:
- Proper names and institution names — spelling, capitalisation
- Numbers, dates, and abbreviations — compare with recording
- Punctuation in compound sentences — verify commas before subordinate clauses and question marks
- Speaker transitions — verify correct attribution of statements
- Technical terms from the supplied glossary — targeted text search
- Homophonic substitutions in critical passages — play back passages with factual claims
Feedback for Corrections
Specific feedback accelerates correction and improves its accuracy: "speaker name missing in passage 12:30–14:00" is far better than "the transcript is incomplete." For larger volumes, mark problematic passages in the document and supply as comments.
Quick Overview — Print Before Ordering
| Area | Check |
|---|---|
| ------ | ------- |
| Recording Preparation | |
| Audibility verified at three points | ☐ |
| Format compatible with recipient | ☐ |
| Files named consistently | ☐ |
| Long recordings split | ☐ |
| Requirements | |
| Transcription type specified (verbatim / edited) | ☐ |
| Number and names of speakers provided | ☐ |
| Foreign-language passages addressed | ☐ |
| Output format and timestamps specified | ☐ |
| Terminology glossary provided | ☐ |
| Data Protection | |
| Content sensitivity assessed | ☐ |
| Server location verified (if relevant) | ☐ |
| Consent of recorded persons confirmed | ☐ |
| Result Review | |
| Transcript length matches recording | ☐ |
| Format is correct | ☐ |
| Six review checkpoints completed | ☐ |
Sources:
- EUR-Lex: Regulation (EU) 2016/679 (GDPR) — https://eur-lex.europa.eu/eli/reg/2016/679/oj