AAA-level: a full text alternative is required in addition to captions and audio description
1.2.8 Media Alternative (Prerecorded)
In Plain Language
[1.2.8 Media Alternative (Prerecorded)](https://www.w3.org/WAI/WCAG22/Understanding/media-alternative-prerecorded.html) is a Level AAA criterion that applies to any prerecorded synchronised media (video with an audio track). It requires a complete text alternative to the media -- a standalone document that conveys every piece of information a sighted, hearing viewer would receive: all dialogue, all non-speech audio, and all visual content including actions, scene changes, on-screen text, and interactive affordances[1].
The mechanism is distinct from the earlier video criteria. [1.2.3 Audio Description or Media Alternative](https://www.w3.org/WAI/WCAG22/Understanding/audio-description-or-media-alternative-prerecorded.html) at Level A lets an author pick one of two paths -- audio description *or* a text alternative[2]. [1.2.5 Audio Description](https://www.w3.org/WAI/WCAG22/Understanding/audio-description-prerecorded.html) at Level AA removes that choice and mandates audio description. At AAA, 1.2.8 layers on top: when 1.2.3 and 1.2.5 have been satisfied with audio description, a full text alternative is still required in addition[1]. It is not a substitute for captions or audio description -- it sits alongside them.
"Complete text alternative" is a specific structural claim, not a rebranded transcript. A dialogue-only transcript fails 1.2.8 because it drops the visual channel. The document has to read like a screenplay or scene-by-scene narration: time-stamped or scene-segmented headings, explicit speaker labels, bracketed descriptions of visual content (actions, settings, charts, on-screen text), bracketed non-speech audio cues (phone ringing, laughter, music stings), and embedded screenshots with descriptive `alt` text only where a visual cannot be conveyed in words alone.
Why It Matters
- Deaf-blind users and users with severe combined vision and hearing loss cannot consume captions (visual) or audio description (auditory). A text alternative rendered through a refreshable braille display is the only channel that delivers both the dialogue and the visual information of a video to these users[1].
- Readers who process written prose more reliably than synchronised audio-video -- including many users with cognitive and learning disabilities -- can pace, re-read, and search a text document in ways a timeline-based player does not allow.
- A structured text alternative is the only artefact in the media bundle that is indexable by search engines, translatable by machine translation, and citable by line. Captions in WebVTT[3] are time-coded fragments, not a readable document; audio description is an audio track.
- Procurement contexts that require a records-management copy of every published video (archival retention, FOIA response, regulated-industry review) are satisfied by the same document that satisfies 1.2.8, because a full media alternative is by construction a complete textual record of the media.
Examples
<video controls>
<source src='training.mp4'
type='video/mp4'>
</video>
<a href='training-full-text.html'>
Full text alternative
</a>
✔ A complete text document covers all dialogue, sounds, and visual information
<video controls>
<source src="training.mp4"
type="video/mp4">
<track kind="captions" src="captions.vtt"
srclang="en" label="English" default>
</video>
<!-- Full text alternative linked directly
below the video player -->
<a href="training-full-text.html">
Full text alternative for training video
</a>
<!-- The text alternative document includes:
- All spoken dialogue with speaker IDs
- Description of all visual content
- Sound effects and music cues
- On-screen text and graphics
- Scene and setting descriptions -->
<video controls>...</video>
<a href='transcript.txt'>Transcript</a>
✘ A dialogue-only transcript omits visual information -- the text alternative must describe everything a viewer would see and hear
<!-- FAILS 1.2.8: transcript only includes
spoken dialogue, missing all visual
information -->
<video controls>
<source src="demo.mp4" type="video/mp4">
</video>
<a href="transcript.txt">Read transcript</a>
<!-- This transcript only contains:
"Welcome to the demo. Click the blue
button to continue."
It omits: the instructor pointing to a
screen, the diagram shown at 0:45, and
the on-screen text that appears at 1:12.
A full text alternative must describe
ALL visual and auditory content. -->
Full text alternative excerpt:
[Scene: office setting, morning]
[Sound: phone ringing]
Narrator: "Every day, our support team..."
[On-screen text: 94% satisfaction rate]
[Visual: bar chart comparing 2024-2025 scores]
✔ All visual and auditory information is captured in the text document
<!-- Structure of a complete text alternative -->
<article class="media-alternative">
<h1>Full Text Alternative: Customer Support
Training Video</h1>
<section>
<h2>Scene 1: Introduction (0:00 - 0:45)</h2>
<p><strong>[Scene: A bright office with rows of
desks. Morning light through windows.]</strong></p>
<p><strong>[Sound: Phone ringing, keyboard
typing]</strong></p>
<p><strong>Narrator:</strong> "Every day, our
customer support team handles thousands of
calls."</p>
<p><strong>[On-screen text: 94% customer
satisfaction rate]</strong></p>
<p><strong>[Visual: Animated bar chart comparing
2024 and 2025 satisfaction scores, showing an
increase from 89% to 94%]</strong></p>
</section>
</article>
<video controls>
<track kind='captions' ...>
</video>
✘ Captions cover only the audio track -- they do not describe visual-only content like charts, actions, or scene details that a full text alternative must include
<!-- FAILS 1.2.8: captions only convey the
audio track, not visual information -->
<video controls>
<source src="overview.mp4" type="video/mp4">
<track kind="captions" src="captions.vtt"
srclang="en" label="English" default>
</video>
<!-- Captions transcribe dialogue and sound
effects but do not describe:
- Visual demonstrations and actions
- Charts, graphs, and diagrams
- On-screen text and labels
- Scene changes and settings
A separate full text alternative document
is required to meet 1.2.8. -->
How to Fix It
- Inventory every prerecorded synchronised media asset. Enumerate each `<video>` or embedded player whose source contains both an audio and a visual track. Audio-only and video-only assets fall under [1.2.1](https://www.w3.org/WAI/WCAG22/Understanding/audio-only-and-video-only-prerecorded.html)[4] and are out of scope for 1.2.8.
- Author a full text alternative document, not a transcript. The document must contain, in reading order: speaker-labelled dialogue, bracketed visual descriptions (setting, action, on-screen text, charts, diagrams), bracketed non-speech audio cues, and a representation of any interactive affordance the video shows (buttons, hyperlinks). A dialogue-only transcript fails because it drops the visual channel; [1.2.2 Captions](https://www.w3.org/WAI/WCAG22/Understanding/captions-prerecorded.html)[5] cover the audio track but do not substitute for the visual content 1.2.8 requires.
- Structure the document by scene or timestamp. Use `<h2>` per scene with an explicit time range ("Scene 1: Introduction (0:00 - 0:45)"). Scene segmentation gives screen-reader users a navigable landmark structure and lets a reader cross-reference back to the video timeline.
- Anchor the link directly to the player. Place an `<a>` with text like "Full text alternative for this video" immediately adjacent to the `<video>` element so the relationship is obvious in both the visual layout and the DOM reading order. A link buried in a separate "Resources" block fails discoverability.
- Bind the text alternative to the video's version. When the source video is re-cut, the text alternative is re-authored in the same change. A stale alternative is worse than none because it misrepresents the current media. Treat the text file as a build artefact of the video, not a separate document.
- Validate by reading the document without the video. The acceptance test is whether a reader who has never seen the video can reconstruct the full experience -- dialogue, who is speaking, what is happening on screen, what data is shown -- from the text alone. If they cannot, the alternative is incomplete regardless of word count.
- Remember 1.2.8 is additive, not alternative. At AAA, the text alternative is required in addition to the AA-level captions ([1.2.2](https://www.w3.org/WAI/WCAG22/Understanding/captions-prerecorded.html))[5] and audio description ([1.2.5](https://www.w3.org/WAI/WCAG22/Understanding/audio-description-prerecorded.html)) when the AA path chose audio description to satisfy [1.2.3](https://www.w3.org/WAI/WCAG22/Understanding/audio-description-or-media-alternative-prerecorded.html)[2][1]. Shipping the text alternative does not let you delete the caption track.
References
- [1] W3C (2023). Understanding Success Criterion 1.2.8: Media Alternative (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/media-alternative-prerecorded.html ↩ ↩ ↩ ↩
- [2] W3C (2023). Understanding Success Criterion 1.2.3: Audio Description or Media Alternative (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-description-or-media-alternative-prerecorded.html ↩ ↩
- [3] W3C (2019). WebVTT: The Web Video Text Tracks Format. W3C, Accessed 2026-04-07. https://www.w3.org/TR/webvtt1/ ↩
- [4] W3C (2023). Understanding Success Criterion 1.2.1: Audio-only and Video-only (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-only-and-video-only-prerecorded.html ↩
- [5] W3C (2023). Understanding Success Criterion 1.2.2: Captions (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/captions-prerecorded.html ↩ ↩