Over 60% of podcast and audio content lacks transcripts
1.2.1 Audio-only and Video-only (Prerecorded)
In Plain Language
[1.2.1 Audio-only and Video-only (Prerecorded), Level A][1] applies to prerecorded media that carries information in only one sensory channel. If the content is audio-only -- a podcast episode, an interview recording, a voice memo embedded via <audio> -- ship a text transcript that presents the same information in correctly sequenced prose, including speaker labels and meaningful non-speech sounds.
If the content is video-only -- a silent product animation, surveillance footage, an assembly demo with no narration embedded via <video> -- ship either a text description of the visual track or a separate audio track that narrates it. Either alternative has to convey the same information a sighted viewer would extract from the pixels[1].
Why It Matters
- A screen reader handed an
<audio controls src="podcast.mp3"></audio>with no accompanying transcript announces "audio" and exposes play and seek controls. There is no mechanism for the assistive technology to surface the speech inside the file, so a deaf or hard-of-hearing user gets the player chrome and nothing else. - A blind user landing on a
<video>element with a silent visual track hears only the ambient audio (often none) when they press play. Without a text description or a descriptive audio track, the visual information -- on-screen text, demonstrated steps, spatial relationships -- is unrecoverable from the element itself. - 1.2.1 is the transcript criterion, not the caption criterion. Captions live in 1.2.2 Captions (Prerecorded) and apply to synchronized media where audio and video carry complementary information. A podcast episode is audio-only and falls under 1.2.1; a webinar recording with a talking head and slides is synchronized media and falls under 1.2.2. Shipping captions on a podcast does not satisfy 1.2.1 -- the criterion asks for a time-independent text document that a user can read, search, and navigate with a screen reader[1].
- The W3C defines an acceptable alternative as a "document including correctly sequenced text descriptions of time-based visual and auditory information," with speaker identification and significant non-speech sounds such as applause, laughter, and audience questions where they carry meaning[1]. A partial summary or marketing blurb does not clear the bar.
- Transcripts compound in value beyond the primary audience: they expose the content to full-text search, they index in search engines, and they serve users on metered connections or in quiet environments who cannot play audio. The regulatory requirement targets deaf and blind users; the downstream benefits reach everyone who would rather read than listen.
Examples
<audio controls src='podcast-ep12.mp3'></audio>
<details><summary>Read transcript</summary>
<p>[Host] Welcome to episode 12...</p>
</details>
✔ Full transcript provided alongside the audio player
<audio controls src="podcast-ep12.mp3"></audio>
<details>
<summary>Read transcript</summary>
<p>[Host] Welcome to episode 12. Today we discuss...</p>
<p>[Guest] Thanks for having me. The key issue is...</p>
</details>
<audio controls src='podcast-ep12.mp3'></audio>
✘ No transcript -- deaf and hard-of-hearing users cannot access this content
<!-- FAILS: no transcript provided -->
<audio controls src="podcast-ep12.mp3"></audio>
<!-- Users who cannot hear the audio have
no way to access the content -->
<video controls src='assembly-demo.mp4'></video>
<div class='transcript'>
<h3>Text description</h3>
<p>Step 1: Attach the base plate...</p>
</div>
✔ Text description conveys the same visual information
<video controls src="assembly-demo.mp4"></video>
<div class="transcript">
<h3>Text description</h3>
<p>Step 1: Attach the base plate to the frame
using the four corner bolts.</p>
<p>Step 2: Slide the panel into the guide
rails from left to right.</p>
</div>
<video controls src='assembly-demo.mp4'></video>
✘ No text alternative -- blind users cannot know what the video demonstrates
<!-- FAILS: no text alternative provided -->
<video controls src="assembly-demo.mp4"></video>
<!-- Blind users have no way to understand
what is shown in the video -->
How to Fix It
- Inventory every prerecorded audio-only and video-only asset. Look for
<audio>elements,<video>elements whose source files carry no dialogue track, embedded podcast players, silent GIF-to-MP4 animations, and any media that communicates information in a single sensory channel. Live media is out of scope for 1.2.1 -- it belongs to 1.2.9. - For each audio-only file, produce a verbatim transcript. Include speaker labels (
[Host],[Guest]), the spoken content in sequence, and non-speech sounds that carry meaning (applause, laughter, audience questions, a door slam that ends the scene)[1]. Background music that is purely atmospheric does not need to be transcribed; a lyric that is part of the content does. - For each video-only file, write a text description of the visual track -- or produce a narrated audio track. The description has to cover on-screen text, demonstrated actions, spatial relationships, and anything else a sighted viewer would pick up. A reader who never watches the video should finish with the same understanding. Generic captions ("product spins on turntable") are not enough when the video is teaching a procedure.
- Expose the alternative in the DOM next to the player. A collapsible
<details>/<summary>pair immediately after the media element keeps the transcript discoverable without forcing scroll. A visible<section>with a heading works just as well. A link to a separate transcript page is acceptable if the link is adjacent to the player and its accessible name identifies the target media -- avoid "transcript" on its own when a page has multiple players. - Do not conflate 1.2.1 with 1.2.2. A
<track kind="captions">child of a<video>element satisfies 1.2.2 for synchronized media, not 1.2.1 for audio-only or video-only media. If the asset has only one sensory channel, a time-independent transcript or description is the remediation; a WebVTT caption file on its own is not. - Keep alternatives in sync with the media. When a podcast episode is re-cut or a demo video is reshot, the transcript or description has to be regenerated. Stale alternatives fail the criterion in spirit and mislead the users who rely on them.