Filler Words in Transcription: How They Arise, Why They Matter, and When to Keep Them
"So, um, I would actually say that..." — everyone says it, few people notice it. Filler words are a natural part of spoken language, but in a transcript they can obscure meaning and make reading difficult. The question is not just how to remove them, but whether to remove them at all. It depends entirely on what the transcript is for.
What Filler Words Are and Where They Come From
Linguists use various terms: filled pauses, discourse markers, disfluencies. In English, this category includes filled sounds like "um" and "uh," phrases like "so," "actually," "like," "you know," "I mean," "right," and "basically," but also word repetitions, incomplete sentences, and self-correction sequences — "I went, I mean she went there."
These phenomena are not mistakes. They serve concrete functions in spoken language: they hold the floor for the speaker while they formulate a thought; they signal to listeners that the utterance has not ended; they can indicate uncertainty or sensitivity about a topic; they maintain contact with the audience. Clark and Fox Tree (2002) demonstrated in a landmark study that "uh" and "um" are not random — they are intentional and carry information about the expected length of the upcoming pause.
Why do filler words come as such a surprise in a transcript? Spoken language is produced in real time. Errors, hesitations, and self-corrections are the norm in spontaneous speech — and the listener's brain filters them out automatically during live conversation. Transcription removes that filter. It freezes spoken language into text and reveals what went unnoticed during listening.
Why Filler Words Are Problematic in Transcripts
A paragraph full of fillers is hard to read. "Um, so actually I think, uh, yeah, that project, you know, was kind of, um, interesting" — this sentence says little, takes up a lot of space, and requires significantly more concentration from the reader than the equivalent without fillers.
In journalistic transcription, quotability is essential. A direct quote from a respondent must faithfully reflect what they said — but it must also be readable. "Um, so, you know, I think that project was actually kind of, um, interesting" is difficult to use as a journalistic quote.
Fillers also complicate automated text processing. Tools for information extraction, summarization, or machine translation work better with clean sentences. An excess of fillers skews statistical models and reduces output quality.
The practical impact: heavy filler content can lengthen a transcript by 10–20% with no added informational value. A longer transcript means more editing time, more space in the document, a less readable result.
How Algorithms Detect Filler Words
Filler word detection is a separate machine learning task. Models trained on labeled data — each word tagged as "filler" or "content" — can distinguish filled pauses from content words in context.
But they run into a fundamental problem: the word "actually" is a filler in "Actually, um, it's complicated," but content in "That's actually the correct solution." The algorithm needs context — and context in natural language is complicated.
The confidence score (the certainty score a model assigns to each transcribed word) can serve as an auxiliary signal. Filled sounds like "um" typically receive low scores — the model is uncertain whether it was a word or just a sound. But this is not a reliable indicator: a correctly transcribed word in a noisy section of the recording can also receive a low score.
Detection models trained on English filler words may not perform well on other languages. "Uh" and "um" in English have a different acoustic profile than their equivalents in other languages. Systems that allow configuration of expressions to be corrected or removed give the user direct control — you can define a list of words to be removed from the transcript or replaced.
When to Remove Filler Words — and When Not To
The right decision depends on the purpose of the transcript. There is no universal answer.
Journalistic Quote — With Discretion
The ethical rule in journalism: a quote must faithfully capture the meaning of what was said, but need not be verbatim if the fillers do not distort the meaning. Brackets with ellipsis "(...)" or "[...]" mark omissions.
What is acceptable: removing "um" and "so" at the beginning of a sentence where they serve only as time-fillers with no informational value. What is not acceptable: changing words, shortening a sentence so it loses its original meaning, or combining parts of different utterances without marking the edits.
A direct quote must hold up as a faithful record of what the respondent said. Editing for readability is accepted practice in journalism — falsification is not.
Academic Research — When Fillers Are Data
For conversation analysis, discourse analysis, or narrative research, fillers are fully valid data. "Um" before an answer to a sensitive question may signal hesitation or a taboo topic. "So" at the beginning of a turn signals taking the floor. The way a respondent formulates their answer — with hesitations, self-corrections, digressions — is part of the data, not noise.
In these disciplines, verbatim transcription is used, preserving transcription conventions (such as Jefferson notation) with precise pause durations, intonation marks, and overlaps. Removing fillers would destroy research-relevant information.
For thematic analysis or content research — where the interest is what the respondent said, not how they said it — a clean transcript without fillers is entirely appropriate.
Corporate Meeting Transcription — Clean Text Wins
In a meeting recording the priority is content: decisions, action items, deadlines, and conclusions. Journalistic quotability is not on the agenda. A clean transcript without fillers is significantly more readable — and automatic removal of pre-defined expressions is fully justified here.
Transcription for Subtitles — Space Is Limited
In subtitles, space is constrained (approximately 42 characters per line). Fillers are almost always omitted. The exception is a documentary film or programme where the authenticity of the speaker's delivery is part of the intention — in that case, retaining fillers at the cost of subtitle block length may be chosen.
Conclusion
Filler words are not the enemy. They are a natural part of how we speak — a signal that thinking is happening in real time. The problem arises when they appear in a medium they were not intended for: readable text.
Three questions before editing a transcript:
- Is the transcript serving as a journalistic quote or as a record for content analysis?
- Does removing the fillers change the meaning of what was said?
- Must the final text hold up as direct speech, or is it a paraphrase?
If it is a quote or analysis of delivery — keep them. If it is about content and readability — remove them. And if you are unsure: keep both versions. The original with fillers can be kept as a backup; the edited version serves as the working document.
Sources
- Clark, H.H. & Fox Tree, J.E. (2002). Using uh and um in spontaneous speaking. Cognition, 84(1), 73–111. [doi:10.1016/S0010-0277(02)00017-3]
- Hough, J. & Schlangen, D. (2017). Joint incremental disfluency detection and utterance segmentation from speech. EACL 2017.
- Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. Lerner (Ed.), Conversation Analysis: Studies from the First Generation. John Benjamins.
- Society of Professional Journalists — Code of Ethics. https://www.spj.org/ethicscode.asp