Level A

Over 99% of video content on the web lacks audio descriptions

1.2.3 Audio Description or Media Alternative (Prerecorded)

In Plain Language

[1.2.3 Audio Description or Media Alternative (Prerecorded)] is a Level A criterion that applies to prerecorded synchronized media (video with a soundtrack). It gives authors a choice: either provide an audio description track that narrates the important visual information during natural pauses in the dialogue, or provide a full text alternative -- a time-sequenced document that conveys both the audio and the visual content[1].

The two options are not interchangeable in effect, only in conformance. An audio description track sits alongside the video and plays back in sync. A media alternative is a standalone document, structured like a screenplay, that a user can read instead of watching. Either path must reproduce the same information a sighted viewer receives from the visual channel -- not a summary, not a highlights list[1].

This criterion is frequently confused with 1.2.5 Audio Description (Prerecorded) at Level AA, which removes the choice and requires an audio description track regardless of whether a text alternative exists. A team that ships only a text media alternative conforms to 1.2.3 but fails 1.2.5[2].

Why It Matters

  • Visual-only information -- on-screen text, charts, gestures, UI interactions, scene changes, and silent demonstrations -- is unavailable to blind and low-vision users unless it is either narrated in the soundtrack or captured in a parallel text document. A narrator saying "as you can see here" without describing what is on screen leaves a gap that assistive technology cannot fill in.
  • The two conformance paths have very different production economics. An audio description track requires a voice artist, a script timed to existing pauses, and a second audio mix; a media alternative is a text document and can be drafted from the shooting script. For talking-head and panel content where the audio is already self-contained, neither may be needed. For dense visual content (screencasts, product demos, data visualizations) where natural pauses are short, a media alternative is usually cheaper and more complete than squeezing description into gaps.
  • A media alternative is machine-readable. Screen readers navigate it with standard heading and list commands, users can search it, translate it, and reformat it, and it can be indexed by search engines. An audio description track cannot be any of these things.
  • If the existing soundtrack already conveys all the important visual information -- a lecture where the speaker reads every slide aloud, for example -- no additional audio description or media alternative is required to conform to 1.2.3[1]. The test is whether a listener who cannot see the screen receives the same information, not whether description is technically present.

Examples

Do: Provide an audio description track for visual content

<video controls>

  <source src='training.mp4' type='video/mp4'>

  <track kind='captions' src='training-en.vtt' srclang='en' label='English' default>

  <track kind='descriptions' src='training-ad.vtt' srclang='en' label='Audio Descriptions'>

</video>

✔ Audio description track describes visual actions and on-screen text

<video controls>
  <source src="training.mp4" type="video/mp4">
  <track kind="captions" src="training-en.vtt"
         srclang="en" label="English" default>
  <track kind="descriptions" src="training-ad.vtt"
         srclang="en" label="Audio Descriptions">
</video>

<!-- training-ad.vtt -->
WEBVTT

00:00:05.000 --> 00:00:08.000
The instructor points to a flowchart showing
three steps: Input, Process, and Output.

00:00:15.000 --> 00:00:18.000
A bar chart appears comparing results from
2023 and 2024, with 2024 showing 40% growth.
Don't: Show visual-only content with no description

<video controls src='demo.mp4'></video>

Narrator says: "As you can see here..." and "Notice how this changes..."

✘ Visual demonstrations referenced in speech but never described -- blind users cannot follow along

<!-- FAILS: visual content not described -->
<video controls src="demo.mp4"></video>

<!-- Narrator says "As you can see here..."
     and "Notice how this changes when I click..."
     but no audio description or text alternative
     explains what is actually being shown -->
Do: Provide a full text alternative describing both audio and visual content

<video controls>...</video>

<details>

  <summary>Full text alternative</summary>

  <p>[00:00] Title card reads "Quarterly Review"...</p>

</details>

✔ A complete text alternative describes all visual and audio content

<video controls>
  <source src="quarterly-review.mp4" type="video/mp4">
  <track kind="captions" src="review-en.vtt"
         srclang="en" label="English" default>
</video>

<details>
  <summary>Full text alternative for this video</summary>
  <p>[00:00] Title card reads "Quarterly Review Q4 2024."</p>
  <p>[00:05] Presenter stands at podium. Slide behind her
  shows a pie chart: 45% North America, 30% Europe,
  25% Asia Pacific.</p>
  <p>[00:12] Presenter says: "Revenue grew 18% this quarter."
  A bar chart animates showing Q3 at $2.1M and Q4 at $2.5M.</p>
</details>
Don't: Provide a transcript that only covers the audio

<video controls src='tutorial.mp4'></video>

<p>Transcript: "Click the button on the right..."</p>

✘ Transcript only includes speech -- on-screen actions, diagrams, and visual steps are not described

<!-- FAILS: transcript covers only the audio -->
<video controls src="tutorial.mp4"></video>

<p>Transcript: "Click the button on the right side
to open the settings panel. Then select the option
shown at the top."</p>

<!-- This transcript only includes what was spoken.
     The visual steps, screen layouts, and UI elements
     referenced are never described. A blind user still
     cannot follow the tutorial. -->

How to Fix It

  1. Audit the soundtrack against the visual channel. Watch each prerecorded video with the picture off. Note every point where the audio refers to something ("here", "this", "as shown") without naming it, every on-screen text frame that is not read aloud, every chart or diagram that is presented silently, and every visual action (a hand gesture, a UI click, a scene change) that carries meaning. That list is the set of gaps the criterion requires you to fill.
  2. Pick a conformance path per video, not per library. For content where the audio is already a complete narration (monologues, interviews, conference talks where the speaker reads their slides), no remediation is required. For dense visual content with tight pauses, choose a media alternative -- writing prose is faster and more accurate than cramming description into a two-second gap. For narrative and instructional video with natural breathing room, an audio description track is the lower-friction path because users do not have to leave the player.
  3. For the audio description path, attach a <track kind="descriptions"> WebVTT file to the <video> element. Cue timestamps must sit inside existing pauses in the main audio -- if the description cannot fit, 1.2.3 allows you to fall back to the media alternative path instead. Do not mix the description into the primary soundtrack, because users who do not need it cannot turn it off.
  4. For the media alternative path, publish a text document that is a true equivalent, not a summary. Structure it chronologically with timestamps, interleave speech and visual description, transcribe on-screen text verbatim, and describe charts by their actual data values ("bar chart: Q3 $2.1M, Q4 $2.5M"), not by their appearance ("a bar chart is shown"). A reader who cannot access the video must be able to reconstruct what happened. Link the document directly from the video -- a sibling element, a <details> disclosure, or a clearly labeled link -- so a screen-reader user finds it without scanning the page.
  5. Do not confuse a captions file with a media alternative. Captions satisfy 1.2.2 Captions (Prerecorded); they transcribe audio for users who cannot hear it. A caption track that says "Click the button on the right" without describing which button or where it is does not satisfy 1.2.3 for a blind user. The two criteria cover orthogonal channels and require separate artifacts.
  6. Plan for 1.2.5 if the site is targeting Level AA. Level AA removes the choice: an audio description track is required even when a media alternative is already published. If you are writing policy for a procurement process, treat the media alternative as a belt-and-braces addition rather than a substitute for the description track[2].

References

  1. [1] W3C (2023). Understanding Success Criterion 1.2.3: Audio Description or Media Alternative (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-description-or-media-alternative-prerecorded.html
  2. [2] W3C (2023). Web Content Accessibility Guidelines (WCAG) 2.2. W3C, Accessed 2026-04-07. https://www.w3.org/TR/WCAG22/