Data Privacy and Security When Transcribing Sensitive Recordings
A recording is personal data. Its content may be even more sensitive — health information, legal records, HR interviews. Sending a recording to a cloud API means the audio data leaves your environment and reaches a third party. What happens to it, how it affects your GDPR obligations, and what you must keep under control.
Why Transcription Is a Compliance Topic
A voice recording identifies a physical person by their voice. GDPR defines personal data as any information relating to an identified or identifiable natural person — a voice recording unambiguously meets this definition.
The content of a recording may belong to a more strictly protected category of data. Art. 9 GDPR defines special categories of personal data whose processing requires explicit consent or another legal basis: health information (consultation with a doctor, therapy session), trade union membership (minutes of a trade union meeting), religious beliefs (pastoral conversations), data on sexual orientation (HR records). A recording containing such information requires stricter handling than general personal data.
Controller and Processor
Your organisation, which decides why and how transcription is produced, is the data controller. The transcription service that technically processes the recording is the data processor. The obligation under Art. 28 GDPR: conclude a written DPA (Data Processing Agreement) with each transcription service before processing begins.
Without a DPA, processing personal data via API is unlawful — regardless of how good a transcript it returns.
What Happens to Data in a Cloud API
Where the Servers Are
OpenAI (Whisper API), Deepgram, AssemblyAI and ElevenLabs Scribe have servers primarily in the US. Every submission of a recording to their API is a transfer of personal data to a third country outside the European Economic Area. Such a transfer requires appropriate safeguards: a DPA with Standard Contractual Clauses (SCC) or another transfer mechanism approved by the Commission.
Google Cloud (Google STT) offers an EU region — but this depends on account configuration. The default settings may not guarantee processing in the EU. This must be verified and documented.
Local Whisper: processing happens locally on your own server. Data does not leave your environment. This carries the lowest data transfer risk — at the cost of lower accuracy compared to cloud models and higher hardware requirements.
How Long Data Is Retained
Each transcription service has its own data retention policy. The spectrum is wide: zero retention (data deleted immediately after the transcript is returned, available as an explicit setting), 30 days (default for some services for customer support purposes), longer retention for quality review. You must read the Terms of Service and Privacy Policy of each service, activate zero retention where it is available, and document the status in the processing record.
Model Training on Your Data
Critical point: some transcription services may use recordings and transcripts to improve or train models unless an opt-out is explicitly activated. This must be verified in the terms of each service used. For sensitive data — health, legal, HR — deactivating this processing is an obligation, not a choice.
Practical GDPR Checklist
Before deploying a transcription service for sensitive recordings:
Data Analysis:
- What type of data will be transcribed? General personal data, or special categories (health, legal, HR)?
- Is there a legal basis for processing? (consent, legitimate interest, legal obligation)
Contractual Basis:
- DPA concluded with each transcription service?
- DPA includes SCC for transfers outside the EU (for US-based services)?
- DPA prohibits use of data for model training?
Technical Measures:
- Identify and document server locations for processing
- Activate zero data retention where available
- Deactivate use of data for model training
- Include transcription process in the Record of Processing Activities (RoPA) under Art. 30 GDPR
Operational Security:
- Pseudonymisation of the recording where possible (removal of names, ID numbers, addresses from content)
- Encryption in transit: HTTPS as a minimum for data transfer
- Access rights: who in your organisation may access transcripts
- Data deletion: systematic deletion of recordings and transcripts after retention period expires
Alternatives for the Highest Level of Protection
Local Transcription
Local Whisper running on your own server — data does not leave the organisation's environment. For healthcare facilities, law firms, and HR departments processing special categories of personal data, this is the most reliable architecture from a GDPR perspective. Cost: lower accuracy compared to cloud models and investment in GPU hardware. A37
Private Cloud
Transcription infrastructure in your own private cloud — full data control with cloud flexibility. Higher setup and management costs. Relevant for large organisations with ongoing needs and stricter security requirements (healthcare systems, public administration).
Czech Transcription System processes recordings via cloud API of multiple models — servers in the US (OpenAI, Deepgram, AssemblyAI, ElevenLabs) and in the EU depending on configuration (Google Cloud). For transcribing sensitive recordings this requires a DPA with each of the included services, activation of zero retention, and verification of data training terms. For recordings with special categories of personal data, it is worth considering the local option (Local Whisper), which is part of the system and eliminates data transfer outside your environment.
The specifics of health data processing are elaborated in the medical transcription overview A17. Legal transcription has its own data protection requirements A16. An architectural comparison of local and cloud transcription will offer a more detailed look at the security trade-offs A37.
Sources:
- GDPR — Regulation of the European Parliament and of the Council (EU) 2016/679, Arts. 9 and 28 [eur-lex.europa.eu]
- Commission Implementing Decision 2021/914 — Standard Contractual Clauses [eur-lex.europa.eu]