Custom Terminology in Transcription: How to Prepare a Glossary of Names, Acronyms and Terms
A transcript can look "good" at first glance — and still be unusable in practice because it garbles a guest's name, a company name, or an internal acronym. And often the problem is not hundreds of errors. It is a few dozen terms that recur in your recordings and that you care about getting right.
A terminology list is a simple preparation step that targets exactly this class of errors. It will not fix everything. It will not overcome noise or echo. But when prepared well, it saves editing time and improves factual accuracy precisely where people notice mistakes immediately.
Why Transcription Fails on Names and Proper Nouns
Everyday language is relatively "easy" for modern transcription systems: it is repetitive, well-covered in training data, and has predictable context. Proper names, brands, and internal terms are the opposite.
Typical problem categories:
- Personal names (similar-sounding names being confused, uncommon surnames)
- Company and institution names (with or without special characters, acronyms in various forms)
- Acronyms (KPI, OKR, NDA) that the system sometimes transcribes as a common word
- Internal names (projects, team names, system names) that make no sense outside your context
The most dangerous error is not a visible typo. The most dangerous situation is when the resulting word looks perfectly correct but is a different word. In the text it passes unnoticed — and only later do you discover that the transcript "looked right" but shifted the meaning.
What a Terminology List Is and What to Expect from It
A terminology list is not a dictionary. It is not a rule that corrects entire sentences. It is a list of terms that matter, which the transcription system should prioritize during recognition.
What It Typically Improves
- recurring names in a single recording (guest name, product name),
- acronyms and internal names,
- domain terms that would otherwise end up as a similar-sounding common word.
What It Typically Does Not Improve
- noise, echo, and compression artifacts,
- overlapping speech (when multiple people talk simultaneously),
- situations where the speaker is far from the microphone or barely audible.
When the problem is in the recording, a term list will not save it. In that case, improving recording conditions is more effective (see A12).
How to Prepare a Terminology List (Practical Steps)
1) Collect Terms from Practice, Not from Memory
The fastest approach is to start from what you already have:
- list of people (team, guests, clients),
- company and product names from internal documents,
- project and system names,
- domain terms that recur in your conversations.
Start with a small list covering the most frequent terms. Add more only based on actual errors in transcripts. This prevents the list from growing to hundreds of entries that ultimately add ambiguity rather than clarity.
2) Choose One Spelling and Stick with It
With names and acronyms, people often write them differently:
- with or without special characters,
- uppercase vs. lowercase,
- hyphen vs. space,
- acronym vs. full name.
Choose one preferred form and maintain it in the list. In practice, consistency matters more than "perfect spelling" because the goal is to have uniform notation in the final text.
3) Account for Inflected Forms (Relevant for Morphologically Rich Languages)
In languages with grammatical case systems (such as Czech, German, or Finnish), names change form depending on their grammatical role. A surname may appear in several distinct forms within a single conversation.
What to do:
- If your transcription workflow uses only base forms, start with the base form.
- If it supports multiple forms, add the most common inflections.
In general, start simple and add variants only based on actual results. The impact of morphological complexity on transcription is discussed in article A02.
4) Handle Multi-Word Names and Acronyms as Pairs
For institutions and long names, two forms are typically used in practice:
- the full name ("Department of Health and Human Services"),
- the acronym ("HHS").
If both appear in your recordings, it is useful to have both in the list. Similarly, for names that people shorten, be careful not to include too many similar variants that end up creating confusion.
Recommended List Format
If you do not have a specific format requirement, simplicity works best:
- one term per line,
- grouped by type (names / companies / acronyms / terms),
- minimal duplicates.
For teams, a practical rule works well: add a term only when it repeatedly shows up as a problem in transcripts, or when it is genuinely important for the result.
How to Use Terminology During Transcription and Verify It Helped
The general process is simple: attach the terminology list to the transcription job (or configure it during processing) and then verify the result.
In Transkripce, a terminology list can be provided when processing a recording so that the system better captures names, acronyms, and domain terms.
A Quick Check That Saves the Most Time
After transcription, do two quick things:
1) search the text for your 10-20 most important terms (names, titles, acronyms),
2) check whether they are consistent and correct.
If a term is still wrong:
- add a spelling variant (if reasonable),
- check whether the problem is in the audio (noise, echo, distant microphone),
- consider simplification (e.g., using the preferred acronym instead of the full name in the text).
Maintaining the List in a Team
A terminology list is most effective when it is short, up to date, and someone takes ownership of it.
A practical regime that works without heavy bureaucracy:
- one list owner (a role, not necessarily a specific person),
- add terms only for repeated errors or high importance,
- occasional review: remove outdated names and merge duplicates.
A terminology list is not a magic solution. It is a lever for a specific class of errors — and those tend to be the most visible ones. When you build it from real practice and keep it clean, it reduces editing time and improves factual accuracy without requiring you to re-transcribe everything manually.