Research Interview and Transcription: Methodological Requirements and Practical Solutions

March 24, 2026 · 6 min read ·

Research transcription is not merely a technical matter — it is a methodological decision. The way you capture a respondent's spoken words determines what analysis can discover from them. This article connects the methodological demands of qualitative research with the reality of automatic transcription tools.

Transcription as a Methodological Choice

Before choosing a tool, you must decide what kind of transcription you need. Verbatim and clean transcription are different products for different analytical purposes — confusing them leads to loss of research-relevant information.

Verbatim transcription preserves everything: pauses with their duration, repetitions, incomplete sentences, slips of the tongue, laughter, intonational features. Example: "So um (2.5) I actually- I actually changed my mind." Used in conversation analysis, discourse analysis, narrative research, and anywhere the manner of expression is part of the data, not merely its container.

Clean transcription is grammatically corrected, filler words removed, sentences completed. Example of the same utterance: "I changed my mind." Appropriate for thematic analysis, grounded theory, or research where the content of statements matters, not the way they are formulated.

The key question before starting: what kind of transcription does the analysis require? Answering this before choosing a tool saves hours of work and potential methodological challenges.

Transcription Conventions — How Research Captures Spoken Language

Transcription conventions are standardised symbol systems for capturing spoken language. Automatic transcription naturally follows no convention — which is why it is typically a first draft, not a final transcript.

Jefferson notation is the standard for conversation analysis (CA). It was developed by Gail Jefferson in the 1970s and 80s. Key symbols: (2.5) = 2.5-second pause; = = immediate uptake without pause; [...] = overlapping speakers; >text< = faster pace; °text° = quieter passage; .hh = in-breath. CA research requires this convention for reproducibility — without it, analysis by other researchers is impossible (Jefferson, 2004).

CHAT format (Codes for the Human Analysis of Transcripts) is part of the CHILDES database — a system for research into language development, aphasia, and bilingualism. MacWhinney (2000) describes this format in detail and the TalkBank database enables sharing transcripts in this standard.

For thematic analysis or organisational research, simpler conventions usually suffice: speaker labels (R1, R2, I = interviewer), square brackets for unintelligible passages, bold for emphasis. Consistency within your own convention matters more than system complexity. If a research team introduces its own convention, it must be documented and applied uniformly.

Automatic Transcription in Research — Benefits and Limits

Time Savings as the Main Argument

Manual transcription of 60 minutes of a research interview takes an experienced transcriber 3–6 hours. Automatic transcription processes the recording in minutes; the result then requires 30–90 minutes of verification and addition of methodological convention. For a research team transcribing dozens of interviews, this saves tens of hours per project.

What Automatic Transcription Does Not Capture

Precise pause duration — Jefferson notation (2.5) requires exact measurement in seconds. The transcription model does not mark pauses in text; the researcher must add them manually by listening.

Intonational features — rising intonation, emphasis, quieter passages are encoded in Jefferson notation. The model does not mark them in text.

Speaker overlaps — simultaneous speech is present in the recording, but the transcript separates it only approximately. Precise overlap marking [ must be added manually.

Unintelligible passages — the model transcribes even unintelligible speech, guessing a word. The researcher must mark such places [unintelligible] and return to the recording.

Recommended Procedure: Automatic Transcription as First Draft

Step 1: Run automatic transcription → obtain a text base in minutes.

Step 2: Play the recording and verify the transcript passage by passage — not just reading the result, but actively listening.

Step 3: Add transcription convention (pauses, mishearings, intonation) according to methodological requirements.

Step 4: Mark unintelligible passages, correct names, terminology, and respondents' proper nouns.

Result: a transcript meeting the methodological standards of the research in a fraction of the original time.

Ethics and Data Protection

Informed Consent and Cloud Processing

When using a cloud transcription service, data leaves the research institution and is processed by a third party. The participant's informed consent must account for this — if the consent does not inform about automatic processing, it may constitute a violation of the conditions under which it was granted.

Recommended consent wording: "The recording will be processed by an automated transcription system of a third party ([provider name], servers in [location]). Data will be deleted after processing in accordance with the provider's privacy policy."

GDPR (EU Regulation 2016/679) classifies an audio recording as personal data. Processing personal data by a third party requires a Data Processing Agreement (DPA) or equivalent contractual arrangement. For research institutions: consult with the institution's DPO (Data Protection Officer).

Anonymisation in Transcription

Replace respondents' names with codes (R1, R2) or pseudonyms. Replace places, workplaces, and identifying circumstances with general terms — [organisation name], [large Moravian city]. Anonymisation happens in the text transcript; the recording may remain under a pseudonym in access-controlled storage.

Beware of the mosaic effect: the combination of seemingly innocent information (field, position, gender, city, years of experience) may identify the respondent. Anonymisation must be thorough and should be consulted with the institution's ethics board.

Local Processing for Sensitive Research

Local Whisper — transcription on the researcher's own computer or server — resolves situations where cloud processing is not acceptable for legal or ethical reasons. Data does not leave the institution. Compromise: lower accuracy compared to cloud models with proprietary data; technical configuration required. Local versus cloud processing is discussed in more detail in A37.

Citing Transcripts in Academic Work

How to Cite a Respondent's Statement in Text

In text: direct quotation with speaker label and reference to the recording.

Example: "'The whole project was set up wrongly from the start,' stated respondent R3 (research interview, 14 March 2025, segm. 12:34–12:41)."

In the methods section: describe the transcription method (verbatim vs. clean), who transcribed or what tool was used, how the transcript was verified, and what transcription convention was applied.

Methodological Transparency

Example formulation for the methods section: "Interviews were transcribed automatically using [tool, version]. Transcripts were verified by listening by the researcher and corrected. The resulting transcripts are available in full in the appendix (without identifying information)."

Research reproducibility depends on the ability to trace the path from data to claims. A transparent description of transcription is part of methodological integrity.

Versioning and Archiving

Transcript v.1 = automatic output. Transcript v.2 = after listening verification. Transcript v.3 = after adding convention and anonymisation. Retain all versions in case of audit, peer review, or re-analysis.

For archiving: TXT or DOCX for readability; JSON for preserving metadata (timestamps, speakers); versioned file management system.

Conclusion

Automatic transcription fundamentally changes the research process — not by relaxing methodological demands, but by significantly accelerating their fulfilment. The researcher obtains a text base quickly; time that would otherwise be spent at the transcriber can be devoted to analysis.

Conditions: know the methodological requirements of your research before choosing a tool; ensure informed consent covers automatic processing; verify the transcript by listening; transparently describe the procedure in the methods section of the work.

For research in academic settings generally — transcription in the context of focus groups, note dictation, and academic work — see A23.

Sources

Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. Lerner (Ed.), Conversation Analysis: Studies from the First Generation. John Benjamins.
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk (3rd ed.). Lawrence Erlbaum. https://talkbank.org
Kvale, S. & Brinkmann, S. (2009). InterViews: Learning the Craft of Qualitative Research Interviewing (2nd ed.). SAGE. [ISBN 978-0-7619-2541-8]
GDPR — Regulation of the European Parliament and of the Council (EU) 2016/679. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679