Level AAA

Live audio-only streams need a synchronised text channel -- a transcript published after the broadcast does not satisfy 1.2.9

1.2.9 Audio-only (Live)

In Plain Language

1.2.9 Audio-only (Live) is a Level AAA criterion that applies to a narrow but awkward case: a live stream that carries audio without video. Live radio simulcasts, conference call audio bridges, audio-only press briefings, and audio podcast livestreams all fall here. The criterion requires a synchronised text alternative that conveys the spoken content -- and any non-spoken audio essential to understanding -- as the broadcast happens^[1].

1.2.9 sits between two adjacent criteria. 1.2.1 Audio-only and Video-only (Prerecorded) at Level A covers prerecorded audio and is satisfied by a transcript prepared after the fact^[2]. 1.2.4 Captions (Live) at Level AA covers live media that includes video, where the text rides on a caption track attached to the video stream^[3]. 1.2.9 is what is left over: live, no video, so there is no caption track to bind text to. The text has to be delivered through a separate channel -- a live transcript panel, a CART feed, or a script document -- and it has to arrive while the audio is still playing.

Why It Matters

deaf and hard-of-hearing users have no auditory channel into a live audio-only stream. Without a synchronised text alternative, the broadcast is inaccessible while it is happening -- a transcript posted hours later is a different artifact, not an accommodation for the live event^[1].
Audio-only broadcasts strip out the visual cues -- speaker identity, slides, lip movement -- that hard-of-hearing users normally lean on to disambiguate speech. The text alternative carries the entire load.
Users in sound-suppressed environments (open-plan offices, libraries, shared transit) cannot turn audio on. A live text channel lets them follow the broadcast on the same schedule as everyone else.
Readers processing content in a second language read faster than they parse unfamiliar spoken phonology. A real-time text track reduces the cognitive load of live audio for non-native speakers.
Public-sector live streams -- council meetings, agency briefings, emergency announcements -- carry statutory accessibility obligations in most jurisdictions, and "we will post the recording with captions tomorrow" does not discharge the duty to make the live event accessible.

Examples

Do: Provide real-time text alongside live audio

<audio controls>

<source src='live-stream'

type='audio/mpeg'>

</audio>

<div role='log'

aria-live='polite'

aria-label='Live transcript'>

...real-time text here...

</div>

✔ A live text feed updates in real time as the speaker talks

<audio controls>
  <source src="live-stream"
          type="audio/mpeg">
</audio>

<!-- Live transcript panel updates via
     real-time captioning service or
     speech-to-text API -->
<div role="log"
     aria-live="polite"
     aria-label="Live transcript">
  <p>[Speaker: Director Smith]
     Good morning. Today we are announcing...</p>
</div>

Don't: Offer only a post-event transcript

<audio controls>...</audio>

<p>Transcript available after event.</p>

✘ A transcript published after the broadcast does not help users who need access during the live event

<!-- FAILS 1.2.9: no real-time text
     alternative during the live broadcast -->
<audio controls>
  <source src="live-briefing"
          type="audio/mpeg">
</audio>
<p>A full transcript will be posted
   within 24 hours after the event.</p>

<!-- Users who are deaf or hard of hearing
     are excluded from the live event.
     A real-time text alternative is required
     while the audio is being broadcast. -->

Do: Use a prepared script delivered alongside the live audio

<audio controls>...</audio>

<a href='prepared-script.html'>

Follow along with the script

</a>

✔ When the live audio follows a script, providing that script in real time is a valid text alternative

<audio controls>
  <source src="live-address"
          type="audio/mpeg">
</audio>

<!-- When the speaker follows a prepared
     script, the script itself can serve as
     the real-time text alternative -->
<a href="prepared-script.html">
  Follow along with the prepared script
</a>

<!-- Note: if the speaker deviates from
     the script, a live captioning service
     should supplement the prepared text
     to capture unscripted remarks. -->

Don't: Provide only a summary or agenda instead of full text

<audio controls>...</audio>

<p>Agenda: 1) Budget 2) Hiring</p>

✘ A summary or agenda does not convey the actual spoken content -- users miss the details of what is being said

<!-- FAILS 1.2.9: a topic list is not
     a text alternative for the spoken
     content -->
<audio controls>
  <source src="live-meeting"
          type="audio/mpeg">
</audio>
<p>Today's agenda:</p>
<ol>
  <li>Budget review</li>
  <li>Hiring update</li>
  <li>Q&amp;A</li>
</ol>

<!-- An agenda tells users what topics
     will be discussed but not what is
     actually being said. A text alternative
     must convey the actual spoken content
     in real time. -->

How to Fix It

Inventory the audio-only live surfaces. Anything that streams live without a video track is in scope: audio-only conference bridges, radio-style livestreams, audio podcast livestreams, telephone press briefings rebroadcast on the web. Live media that includes video belongs to 1.2.4, not here^[3].
Engage a CART (Communication Access Realtime Translation) provider for unscripted content. A trained stenographer transcribes the spoken audio with a few seconds of latency and pushes the text to a feed your page can consume. CART is the W3C-preferred technique for live audio-only because human operators handle speaker changes, accents, and script deviations that ASR mishandles^[1].
Treat ASR as a fallback, not a primary mechanism. Browser speech-recognition APIs and cloud transcription services produce a usable text stream for clean speech in a major language, but error rates climb fast on technical vocabulary, multiple speakers, and noisy inputs. Where ASR is the only option, surface a confidence indicator and route a human corrector into the loop for high-stakes broadcasts.
Use the prepared script when the speaker is reading from one. If the live audio follows a written statement -- a keynote, a press release read aloud, a scripted announcement -- the script itself is a valid synchronised text alternative under 1.2.9, provided it is published in time for users to read along. Pair the script with live captioning for any unscripted segments such as Q&A^[1].
Render the text in a live region next to the player. Mount the transcript panel adjacent to the <audio> element and mark it as role="log" with aria-live="polite" so screen readers announce new lines as they arrive without preempting the user. Avoid aria-live="assertive" -- it interrupts and quickly becomes hostile on a fast feed.
Validate end-to-end latency and fidelity. Measure the delay from spoken word to rendered text on the page, not just from the captioning service to your endpoint. The W3C Understanding document treats trained-operator transcripts as the fidelity bar; targets in the few-seconds range keep the text track usable for live participation^[1].

References

[1] W3C (2023). Understanding Success Criterion 1.2.9: Audio-only (Live). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-only-live.html ↩ ↩ ↩ ↩ ↩
[2] W3C (2023). Understanding Success Criterion 1.2.1: Audio-only and Video-only (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-only-and-video-only-prerecorded.html ↩
[3] W3C (2023). Understanding Success Criterion 1.2.4: Captions (Live). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/captions-live.html ↩ ↩