Level A

Over 60% of podcast and audio content lacks transcripts

1.2.1 Audio-only and Video-only (Prerecorded)

In Plain Language

[1.2.1 Audio-only and Video-only (Prerecorded), Level A][1] applies to prerecorded media that carries information in only one sensory channel. If the content is audio-only -- a podcast episode, an interview recording, a voice memo embedded via <audio> -- ship a text transcript that presents the same information in correctly sequenced prose, including speaker labels and meaningful non-speech sounds.

If the content is video-only -- a silent product animation, surveillance footage, an assembly demo with no narration embedded via <video> -- ship either a text description of the visual track or a separate audio track that narrates it. Either alternative has to convey the same information a sighted viewer would extract from the pixels[1].

Why It Matters

  • A screen reader handed an <audio controls src="podcast.mp3"></audio> with no accompanying transcript announces "audio" and exposes play and seek controls. There is no mechanism for the assistive technology to surface the speech inside the file, so a deaf or hard-of-hearing user gets the player chrome and nothing else.
  • A blind user landing on a <video> element with a silent visual track hears only the ambient audio (often none) when they press play. Without a text description or a descriptive audio track, the visual information -- on-screen text, demonstrated steps, spatial relationships -- is unrecoverable from the element itself.
  • 1.2.1 is the transcript criterion, not the caption criterion. Captions live in 1.2.2 Captions (Prerecorded) and apply to synchronized media where audio and video carry complementary information. A podcast episode is audio-only and falls under 1.2.1; a webinar recording with a talking head and slides is synchronized media and falls under 1.2.2. Shipping captions on a podcast does not satisfy 1.2.1 -- the criterion asks for a time-independent text document that a user can read, search, and navigate with a screen reader[1].
  • The W3C defines an acceptable alternative as a "document including correctly sequenced text descriptions of time-based visual and auditory information," with speaker identification and significant non-speech sounds such as applause, laughter, and audience questions where they carry meaning[1]. A partial summary or marketing blurb does not clear the bar.
  • Transcripts compound in value beyond the primary audience: they expose the content to full-text search, they index in search engines, and they serve users on metered connections or in quiet environments who cannot play audio. The regulatory requirement targets deaf and blind users; the downstream benefits reach everyone who would rather read than listen.

Examples

Do: Provide a transcript for audio-only content

<audio controls src='podcast-ep12.mp3'></audio>

<details><summary>Read transcript</summary>

<p>[Host] Welcome to episode 12...</p>

</details>

✔ Full transcript provided alongside the audio player

<audio controls src="podcast-ep12.mp3"></audio>

<details>
  <summary>Read transcript</summary>
  <p>[Host] Welcome to episode 12. Today we discuss...</p>
  <p>[Guest] Thanks for having me. The key issue is...</p>
</details>
Don't: Audio-only content with no transcript

<audio controls src='podcast-ep12.mp3'></audio>

✘ No transcript -- deaf and hard-of-hearing users cannot access this content

<!-- FAILS: no transcript provided -->
<audio controls src="podcast-ep12.mp3"></audio>

<!-- Users who cannot hear the audio have
     no way to access the content -->
Do: Provide a text alternative for video-only content

<video controls src='assembly-demo.mp4'></video>

<div class='transcript'>

<h3>Text description</h3>

<p>Step 1: Attach the base plate...</p>

</div>

✔ Text description conveys the same visual information

<video controls src="assembly-demo.mp4"></video>

<div class="transcript">
  <h3>Text description</h3>
  <p>Step 1: Attach the base plate to the frame
    using the four corner bolts.</p>
  <p>Step 2: Slide the panel into the guide
    rails from left to right.</p>
</div>
Don't: Video-only content with no alternative

<video controls src='assembly-demo.mp4'></video>

✘ No text alternative -- blind users cannot know what the video demonstrates

<!-- FAILS: no text alternative provided -->
<video controls src="assembly-demo.mp4"></video>

<!-- Blind users have no way to understand
     what is shown in the video -->

How to Fix It

  1. Inventory every prerecorded audio-only and video-only asset. Look for <audio> elements, <video> elements whose source files carry no dialogue track, embedded podcast players, silent GIF-to-MP4 animations, and any media that communicates information in a single sensory channel. Live media is out of scope for 1.2.1 -- it belongs to 1.2.9.
  2. For each audio-only file, produce a verbatim transcript. Include speaker labels ([Host], [Guest]), the spoken content in sequence, and non-speech sounds that carry meaning (applause, laughter, audience questions, a door slam that ends the scene)[1]. Background music that is purely atmospheric does not need to be transcribed; a lyric that is part of the content does.
  3. For each video-only file, write a text description of the visual track -- or produce a narrated audio track. The description has to cover on-screen text, demonstrated actions, spatial relationships, and anything else a sighted viewer would pick up. A reader who never watches the video should finish with the same understanding. Generic captions ("product spins on turntable") are not enough when the video is teaching a procedure.
  4. Expose the alternative in the DOM next to the player. A collapsible <details>/<summary> pair immediately after the media element keeps the transcript discoverable without forcing scroll. A visible <section> with a heading works just as well. A link to a separate transcript page is acceptable if the link is adjacent to the player and its accessible name identifies the target media -- avoid "transcript" on its own when a page has multiple players.
  5. Do not conflate 1.2.1 with 1.2.2. A <track kind="captions"> child of a <video> element satisfies 1.2.2 for synchronized media, not 1.2.1 for audio-only or video-only media. If the asset has only one sensory channel, a time-independent transcript or description is the remediation; a WebVTT caption file on its own is not.
  6. Keep alternatives in sync with the media. When a podcast episode is re-cut or a demo video is reshot, the transcript or description has to be regenerated. Stale alternatives fail the criterion in spirit and mislead the users who rely on them.

References

  1. [1] W3C (2023). Understanding Success Criterion 1.2.1: Audio-only and Video-only (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-only-and-video-only-prerecorded.html