Medical Transcription and Terminology: Why General Models Fall Short
Medical transcription is not transcription "with medical words". It is a specific technical problem: Latin-derived terminology, context-free abbreviations, phonetically similar drugs with different effects. An error in transcription does not carry only linguistic consequences in medicine. Where general models fail and what can realistically be done about it.
What Makes Medical Language Challenging
Medical language was formed by combining a Latin and Greek base with morphological word endings. The result is words that appear almost never in general transcription training data: cholecystectomy, myocardial infarction, hypertensive crisis, cardiomegaly. A transcription model trained primarily on general and conversational content phonetically decodes these words but lacks sufficient training examples for the correct variants.
The result is unpredictable: the model transcribes into the nearest variant from its vocabulary. "Cholecystectomy" may come out as "choley cyst ectomy" or "koli syst ektomy" — phonetically approximate, medically unusable. And more seriously, the error may not be immediately visible. A word that does not exist is easy to recognise. A word that exists but is wrong can pass unnoticed.
Abbreviations: an Invisible Trap
A doctor dictates: "Patient is on i.v. furosemide b.i.d., saturation 94 percent, s.a.t." The model hears a sequence of sounds without context. "i.v." may come out as "ivy" or "I.V." with incorrect expansion. "b.i.d." may come out as "bid" or garbled letters. Abbreviations in their spoken form are letters or shortened expressions meaningless to a model that does not know their expanded form.
The most common medical abbreviations include: i.v. (intravenous), p.o. (per os — by mouth), s.c. (subcutaneous), b.i.d. (twice daily), t.i.d. (three times daily), IM (intramuscular), PRN (as needed, from Latin pro re nata). For a model without medical training, these are random consonants.
Eponyms
Parkinson's disease, Crohn's disease, Alzheimer's disease — the model must correctly transcribe the surname of a scientist or physician in the correct form combined with "disease" or "syndrome". For well-established eponyms with high occurrence in training data, models usually succeed. For less frequent ones — Meniere's disease, Takayasu's arteritis, Budd-Chiari syndrome — transcription from a general model is unreliable.
Phonetically Similar Drugs
This category carries the highest clinical risk. Metformin (diabetes mellitus) and Metoprolol (cardiology) begin phonetically identically — the model must distinguish based on sound and context. Cefuroxime and Cefotaxime are both cephalosporin antibiotics with an identical start. Tramadol and Trandolapril. A transcription error is not merely a linguistic mistake — it is potentially dangerous information in medical documentation that may affect patient care.
Why Training Data Is Not Enough
Whisper large-v3, one of the most accurate generally available models, trained on 680,000 hours of audio data in dozens of languages. Medical dictation in any given language makes up a fraction of a percent of this volume. Models draw on data available on the internet — podcasts, YouTube, media programmes. Medical dictation content is rare there, and where it appears, it tends to come from formal contexts rather than real physician dictation.
The second factor is the nature of dictation. A doctor dictates quickly, with medical abbreviations, with awareness of context the model does not have. Transcription models typically train on read texts or conversational content — dictation style has a different rhythm, different pauses and a different information-to-utterance ratio.
How to Improve Results
Specialised Medical Models
For English, commercial models trained on healthcare documentation exist: Nuance DAX (Microsoft), AWS HealthScribe. These models know medical language from the structure of their training — results for English are significantly better than with general models.
A commercially available Czech-language medically specialised transcription model essentially does not exist as of 2025. Research projects on specialised data are underway, but production deployment is lacking.
Terminology List as a Pragmatic Solution
For many languages, the most practical current approach is a terminology list: a collection of preferred terms for a given medical speciality. An oncologist enters a list with diagnoses and drugs specific to oncology. A cardiologist enters cardiological terms.
The merging layer can take this list into account when choosing between variants from different model transcripts. If three models transcribe a word differently and one variant matches an entry in the list, the listed variant takes priority. Result: transcription for terminology in the list improves. Terminology outside the list does not improve.
The list must be maintained and updated — medicine evolves, new drugs are added.
Human Review as a Requirement
None of the available technologies achieves the accuracy that would allow autonomous use of medical transcription without verification. A physician, healthcare assistant, or specialised transcriber must check the transcript before it is incorporated into documentation. Automation is an aid for acceleration — not a substitute for medically competent review.
GDPR and Health Data
Health data is a special category of personal data under Art. 9 GDPR. Transcription of a recording with a patient — consultation, medical history, therapy session — is health data with a stricter processing regime.
Sending a recording to a cloud API means transferring data to third-party servers, usually outside the EU (OpenAI, Deepgram, AssemblyAI have servers primarily in the US). A DPA (Data Processing Agreement) with each transcription service is an obligation, not a choice. For the highest level of protection, local processing — Local Whisper running on your own server — is the only option where data does not leave the healthcare facility's environment.
The combination of a terminology list, ensemble approach, and human review is currently the most pragmatic path to usable medical transcription. Czech Transcription System supports entering a terminology list and processes recordings across multiple models in parallel — for specialised medical terminology this brings improvement over individual models. The result always requires expert review before inclusion in medical documentation.
Transcription of sensitive health recordings via cloud API is a compliance topic — see the GDPR and data security overview A18. Why the Czech language in general places special demands on transcription models is explained from a linguistic perspective in A02.
Sources:
- Radford et al. (2022). Robust Speech Recognition via Large-Scale Weak Supervision (Whisper). doi:10.48550/arXiv.2212.04356
- GDPR Art. 9 — special categories of personal data [eur-lex.europa.eu]