Level AAA

Captions satisfy 1.2.2 at Level A but do not satisfy 1.2.6 -- signed languages are distinct first languages, not transliterations of written text

1.2.6 Sign Language (Prerecorded)

In Plain Language

1.2.6 Sign Language (Prerecorded) requires a sign language interpretation of all prerecorded audio in synchronized media[1]. It sits at Level AAA in WCAG 2.2[2] and is distinct from the captions requirement at 1.2.2 (Level A)[3]: captions are a text track; sign language is a video of a human signer translating the spoken content.

The mechanism the criterion addresses is linguistic, not sensory. Signed languages (ASL, BSL, Auslan, LSF, and others) are full natural languages with their own grammar, syntax, and lexicon -- they are not word-for-word encodings of the surrounding written language. For many Deaf users, a signed language is a first language and written English (or French, or Japanese) is a second language acquired through schooling. Captions deliver the audio as text in that second language, so comprehension depends on the user's reading fluency in it. A signed interpretation delivers the same content in the user's first language, and also carries prosody, emphasis, and affect through facial grammar and body movement that a caption track cannot encode[1].

Why It Matters

  • Signed languages are distinct first languages with their own grammar -- they are not transliterated English. The W3C Understanding document states plainly that people whose primary language is a signed language "sometimes have limited reading ability" and therefore need an interpretation to access synchronized media on equal terms with hearing users[1].
  • Captions satisfy 1.2.2 at Level A but do not satisfy 1.2.6. The two criteria serve overlapping but non-identical audiences: late-deafened users and hard-of-hearing users often prefer captions; Deaf users whose first language is signed often prefer an interpreter. Both tracks should ship together.
  • Sign language carries information captions cannot. Facial expression, eye gaze, mouth morphemes, and the speed and size of a sign are grammatical features -- they encode tense, negation, question form, and emphasis. Flattening the audio to text removes that channel entirely.
  • The criterion applies to prerecorded synchronized media only. Live sign language interpretation is addressed separately under 1.2.9, and 1.2.6 sits alongside 1.2.4 (Captions, Live, Level AA) in the live-media column of the WCAG media table[2].

Examples

Do: Embed a sign language interpreter in the video

<video controls>

  <source src='presentation-with-sli.mp4'

    type='video/mp4'>

  <track kind='captions' src='captions.vtt'

    srclang='en' label='English' default>

</video>

✔ Video includes a picture-in-picture sign language interpreter

<!-- Video includes a sign language interpreter
     in a picture-in-picture overlay -->
<video controls>
  <source src="presentation-with-sli.mp4"
          type="video/mp4">

  <!-- Captions for users who prefer text -->
  <track kind="captions"
         src="captions.vtt"
         srclang="en"
         label="English" default>
</video>

<!-- The sign language interpreter appears in
     a picture-in-picture window within the video,
     visible throughout the presentation. The
     interpreter is large enough to see clearly
     and positioned so they do not obscure
     critical visual content. -->
Don't: Rely on captions alone as a substitute for sign language

<video controls>...</video>

<!-- Captions provided, no sign language -->

✘ Captions meet 1.2.2 but do not satisfy 1.2.6 -- sign language interpretation is required for AAA compliance

<!-- FAILS 1.2.6: captions alone do not meet
     the sign language requirement -->
<video controls>
  <source src="announcement.mp4" type="video/mp4">
  <track kind="captions" src="captions.vtt"
         srclang="en" label="English" default>
</video>

<!-- Captions satisfy 1.2.2 (Level A) but do not
     satisfy 1.2.6 (Level AAA). Many deaf users
     whose primary language is sign language find
     captions harder to follow than a live
     interpreter. Sign language interpretation
     must be provided separately. -->
Do: Provide a synchronized sign language video alongside the main content

<div class='media-pair'>

  <video id='main' controls>...</video>

  <video id='sli' aria-label='Sign language'>

    ...</video>

</div>

✔ A separate synchronized sign language video plays alongside the main content

<div class="media-pair">
  <!-- Main video content -->
  <video id="main-video" controls>
    <source src="lecture.mp4" type="video/mp4">
    <track kind="captions" src="captions.vtt"
           srclang="en" label="English" default>
  </video>

  <!-- Synchronized sign language interpretation -->
  <video id="sli-video"
         aria-label="Sign language interpretation">
    <source src="lecture-sli.mp4" type="video/mp4">
  </video>
</div>

<!-- JavaScript synchronizes playback so the
     interpreter video stays in sync with the
     main video. The sign language video should
     be large enough for users to clearly see
     hand shapes and facial expressions. -->
Don't: Make the sign language interpreter too small or poorly lit

Interpreter window: 50 x 50 pixels, low contrast, partially cropped

Users cannot distinguish hand shapes or read facial expressions at that size.

✘ Sign language interpretation must be clearly visible -- large enough to read hand shapes and facial expressions

<!-- FAILS: interpreter is too small to be useful -->
<div style="position: relative">
  <video controls>
    <source src="webinar.mp4" type="video/mp4">
  </video>

  <!-- Interpreter overlay at 50x50 pixels -->
  <div style="position: absolute;
              bottom: 5px; right: 5px;
              width: 50px; height: 50px">
    <video src="sli.mp4" autoplay muted></video>
  </div>
</div>

<!-- The interpreter window must be large enough
     for users to clearly see hand shapes, finger
     spelling, and facial expressions. A tiny
     overlay fails to provide meaningful access.
     Recommended minimum: at least 1/6 of the
     total video area with good lighting and
     a plain background. -->

How to Fix It

  1. Inventory prerecorded synchronized media. Any asset that pairs a video track with an audio track carrying speech, narration, or dialogue falls in scope for 1.2.6. Silent video, audio-only recordings, and live streams are out of scope -- those are handled by 1.2.1, 1.2.2, and 1.2.4/1.2.9 respectively[3].
  2. Engage an interpreter fluent in the target signed language. Signed languages are not universal: ASL and BSL share no mutual intelligibility; Auslan, LSF, JSL, and others are distinct again. Match the interpreter to the audience, not to the spoken language of the source audio. Certified interpreters (RID in the US, NRCPD in the UK, equivalents elsewhere) are the baseline.
  3. Film the interpreter against a plain, high-contrast background with even lighting. The frame must capture hands, arms, upper torso, and face. Facial grammar -- raised brows for yes/no questions, furrowed brows for wh-questions, mouth morphemes that modify sign meaning -- is not decorative, so the interpreter's face must be clearly legible at the final rendered size.
  4. Pick a delivery mechanism and commit to it. The W3C Understanding document names three sufficient techniques[1]: a signer inset composited into the primary video stream at production time (picture-in-picture), a separate synchronized signer video displayed alongside or overlaid on the primary player at runtime, or a linked standalone interpretation file reachable from the media page. All three satisfy 1.2.6; pick based on your player and authoring pipeline.
  5. Size the signer so hand shapes and facial grammar are readable. A 50 x 50 pixel corner overlay fails the criterion in practice even if the technique is technically present -- users cannot parse hand configuration, location, movement, or non-manual markers at that resolution. Reserve roughly one-sixth of the total video area as a working floor, and verify on the smallest viewport the player targets.
  6. Keep the caption track. 1.2.6 does not replace 1.2.2 -- the two criteria cover overlapping but non-identical audiences[3]. Late-deafened users, hard-of-hearing users, users in sound-off environments, and users whose signed language does not match the interpreter on the video all rely on the caption track. Ship captions and a signer together.

References

  1. [1] W3C (2023). Understanding Success Criterion 1.2.6: Sign Language (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/sign-language-prerecorded.html
  2. [2] W3C (2023). Web Content Accessibility Guidelines (WCAG) 2.2. W3C, Accessed 2026-04-07. https://www.w3.org/TR/WCAG22/
  3. [3] W3C (2023). Understanding Success Criterion 1.2.1: Audio-only and Video-only (Prerecorded). W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/audio-only-and-video-only-prerecorded.html