Transcription in Academic Settings: From Dictated Notes to Focus Group Analysis
Academic uses of transcription span a range from quick personal notes to methodologically rigorous transcription of a focus group with six participants. Each scenario has different requirements for accuracy, conventions, and format. Where automation helps — and where academic methodology demands more than a machine can deliver.
Four Academic Scenarios
Personal Notes and Dictation
A researcher dictates thoughts, text fragments, notes from the literature. Requirements are the lowest of all scenarios: reasonable accuracy, fast results. The transcription output is a draft for further processing — the editor is the author themselves, who knows the content and can spot errors.
Suitable approach: any model with acceptable accuracy for standard speech. Minimal editing required.
Individual Research Interview
A semi-structured or in-depth interview with two speakers: researcher and respondent. Requirements are higher: verbatim transcription (false starts, hesitations, and filler words are part of the data), speaker identification, and accuracy sufficient for analytical validity.
Diarization with two speakers is reliable — assigning names to speakers is a manual step. The automated output provides a solid foundation for editing. A10
Focus Group
Four to eight participants, a moderator, dynamic group discussion. The greatest challenge for automation: rapid speaker turns, simultaneous speech (crosstalk), and voices similar in volume or vocal profile.
Diarization with multiple speakers is less reliable than with two participants. Automatic transcription serves as a safety net for raw material — the analyst must review the transcript against the recording critically, not merely edit it.
Archival Recordings
Historical recordings, old conference presentations, oral history interviews. Variable audio quality, outdated equipment, sometimes damaged recordings. Automation accuracy is lower — but as a starting point for manual editing, the result is still significantly more efficient than transcribing from scratch.
Methodological Requirements of Academic Transcription
In academic transcription, methodological decisions are part of the research protocol — not merely technical settings.
Verbatim Conventions
False starts, filler words, and pauses are treated as data in many research traditions, not noise to be removed. Conversation analysis works with precise pause durations. Discourse analysis captures hedging and reformulations as part of argumentation strategy.
Automatic transcription captures verbatim data to some extent — but not systematically or completely. Filler words may be removed or normalized. For strictly verbatim transcription, manual review and supplementation are necessary.
Jefferson Notation
A conversation-analytic symbol system for transcribing speech: (0.5) marks a half-second pause, [ marks the onset of overlapping speech, .hh signals an inhalation, ((comment)) is an observer's note. Automatic transcription does not produce this notation — Jefferson Notation is exclusively a manual contribution by the analyst.
For research that requires Jefferson Notation, automation serves only as a first draft of the text. The analyst adds symbolic notation in a second pass.
CHAT Format
A structured format from the CHILDES project for language research (MacWhinney, 2000): a header with metadata, speaker codes (CHI, MOT, *ADU), special codes for errors, repairs, and non-verbal behaviour. Automation does not produce CHAT — the transcription output must be converted to CHAT manually or via a script.
Formats for Import into QDA Tools
Transcription output feeds into qualitative data analysis tools — and the format must match the tool.
MAXQDA imports TXT and DOCX. A transcript with consistent speaker labels (Speaker R1: ...) allows filtering utterances by speaker during coding. Timestamps as part of the text (HH:MM:SS) enable linking back to the recording.
ATLAS.ti imports TXT, DOCX, and PDF. It supports linking to an audio or video file — coding by direct listening, not just from text. Timestamps in the transcript enable precise synchronization.
NVivo imports TXT, DOCX, and PDF. Automatic speaker recognition from a formatted transcript works when the formatting is consistent.
JSON as an archival format is the ideal source file for custom import scripts: per-word timestamps enable precise synchronization with the recording, and speaker identifiers automatically assign utterances. From JSON, any format for a specific tool can be generated.
GDPR in Research with Human Subjects
Research transcription involves recordings of specific respondents — and that has GDPR implications.
Informed consent for recording is a standard part of research protocols involving human subjects. This consent must address or be extended to cover: transcription of the recording by automated systems, processing via cloud APIs (data transfer outside the institution), the method and duration of storing recordings and transcripts, and the right to have the recording and transcript deleted.
Anonymization and pseudonymization are standard practice: label respondents with codes (R1, R2, R3) instead of names. Remove identifiers from the transcript content where possible without losing research meaning. Transcripts shared within the research team should be pseudonymized.
For research involving highly sensitive content (trauma, crime, health data), consider local processing — Local Whisper eliminates data transfer outside the institution's environment. A18
Czech Transcription System transcribes research interviews with diarization — the output separates utterances by speaker. Export in JSON as the primary archive with full metadata; TXT or DOCX with speaker labels for import into MAXQDA or NVivo. Timestamps in the transcript serve for quick verification of citations against the recording.
Specifics of research interview transcription — verbatim conventions and citation format — are described in more detail in A10. Diarization for focus groups is technically more challenging than for two speakers A04. GDPR in the context of sensitive research data requires special attention A18.
Sources:
- Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In Lerner, G. H. (ed.), Conversation Analysis: Studies from the First Generation. John Benjamins.
- MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd edition. Lawrence Erlbaum Associates. [childes.talkbank.org]
- GDPR Art. 9 — special categories of personal data [eur-lex.europa.eu]