Level AAA

Screen readers resolve heteronyms like "lead" and "tear" from spelling alone, so without a pronunciation mechanism the wrong phoneme reaches the listener

3.1.6 Pronunciation

In Plain Language

3.1.6 Pronunciation (Level AAA) requires a mechanism for identifying the pronunciation of words where meaning depends on pronunciation and context does not resolve it[1]. The target is heteronyms -- words spelled identically but pronounced differently with different meanings, like "lead" the metal versus "lead" to guide, or "tear" from an eye versus "tear" in fabric -- and CJK characters (kanji, hanzi) that take different readings depending on the word.

The mechanism is one of four things: an HTML <ruby> annotation carrying furigana, pinyin, zhuyin, or IPA; a link from the word to a glossary entry that spells out the pronunciation; an inline phonetic spelling in parentheses; or a link to an audio clip. Any one of them satisfies the criterion for the specific word it covers.

Why It Matters

  • Screen readers resolve heteronyms from spelling, not meaning. Given "The lead pipe," a text-to-speech engine picks one phoneme based on its lexicon and ships it to the listener; if the engine guesses wrong, the blind user hears a different word than the sighted reader sees, and the sentence stops parsing.
  • In Japanese, Chinese, and Korean content, the same character can take multiple readings (for example, the kanji 生 has dozens). Without <ruby> furigana or pinyin annotations, a screen reader or a learner has no way to pick the reading the author intended.
  • Readers with cognitive and learning disabilities and readers decoding a second language cannot always use surrounding context to back out the intended pronunciation, so the word becomes a comprehension dead end even when sighted fluent readers would recover.
  • In legal, medical, and educational content, the wrong reading is not just awkward -- "minute" (sixty seconds) versus "minute" (tiny) or "dose" versus "doze" changes the meaning of a clause or an instruction, and the mechanism is the only audit trail back to the author's intent.

Examples

Do: Provide inline pronunciation for ambiguous words

The study tested for the presence of lead/lɛd/ (the metal) in drinking water.

✔ Ruby annotation provides pronunciation to distinguish the homograph

<p>The study tested for the presence of
  <ruby>lead<rt>/l&#x025B;d/</rt></ruby>
  (the metal) in drinking water.</p>
<!-- Ruby annotation shows pronunciation -->
Don't: Ambiguous word with no pronunciation cue

The lead was found to exceed safe limits in the sample.

✘ Is this lead (the metal) or lead (the advantage)? No pronunciation mechanism provided

<!-- FAILS: no way to determine pronunciation -->
<p>The lead was found to exceed safe limits
  in the sample.</p>
<!-- "lead" is ambiguous without context -->
Do: Link to a glossary with pronunciation

The patient was asked to read the consent form before the procedure.

✔ Glossary link with pronunciation in title clarifies the intended meaning

<p>The patient was asked to
  <a href="#glossary-read"
     title="Pronunciation: /ri&#x02D0;d/">
    read
  </a>
  the consent form before the procedure.</p>
<!-- Glossary link provides pronunciation -->
Don't: Technical homograph with no disambiguation

The bass was measured at 40 Hz during the sound check.

✘ Is this bass (low-frequency sound) or bass (the fish)? Readers and screen readers cannot determine pronunciation

<!-- FAILS: no pronunciation mechanism -->
<p>The bass was measured at 40 Hz during
  the sound check.</p>
<!-- "bass" could be the fish or the sound -->

How to Fix It

  1. Find the heteronyms and multi-reading characters in your content. In English prose, scan for words whose pronunciation flips with meaning: lead, read, bass, bow, tear, wind, object, record, refuse, produce, conduct, minute, close, desert. In CJK content, scan for characters with multiple common readings where surrounding text does not pin the reading (proper nouns are the usual offender).
  2. Use <ruby> for CJK content. Wrap the base text in <ruby> and put the reading in <rt>: <ruby>東京<rt>とうきょう</rt></ruby> for Japanese furigana, <ruby>北京<rt>Běijīng</rt></ruby> for pinyin. Screen readers that support ruby surface the reading; visually, the reading sits above or beside the base text per the user agent.
  3. Use inline phonetic spelling or a parenthetical for English heteronyms. The terser fix is a parenthetical gloss: lead (/lɛd/, the metal) or tear (/tɛər/, as in fabric). IPA is the precise notation, but a plain-language rhyme ("rhymes with bed") also satisfies the criterion and is easier for non-linguists to read.
  4. Link critical terms to a pronunciation glossary. When the same ambiguous term recurs across a document -- drug names, legal terms, scientific jargon -- link each instance to a glossary entry that carries the phonetic spelling and, ideally, an audio clip. One glossary, many links, no inline noise in the running text.
  5. Do not rely on aria-label as the pronunciation mechanism. Overriding the accessible name with a phonetic respelling replaces the visible text in the accessibility tree, which breaks 2.5.3 Label in Name for speech-input users and confuses the reading order. Ruby, inline phonetics, and glossary links all leave the visible text intact.

References

  1. [1] W3C (2023). Understanding Success Criterion 3.1.6: Pronunciation. W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/pronunciation.html