Level AAA

Screen readers resolve heteronyms like "lead" and "tear" from spelling alone, so without a pronunciation mechanism the wrong phoneme reaches the listener

3.1.6 Pronunciation

In Plain Language

3.1.6 Pronunciation (Level AAA) requires a mechanism for identifying the pronunciation of words where meaning depends on pronunciation and context does not resolve it^[1]. The target is heteronyms -- words spelled identically but pronounced differently with different meanings, like "lead" the metal versus "lead" to guide, or "tear" from an eye versus "tear" in fabric -- and CJK characters (kanji, hanzi) that take different readings depending on the word.

The mechanism is one of four things: an HTML <ruby> annotation carrying furigana, pinyin, zhuyin, or IPA; a link from the word to a glossary entry that spells out the pronunciation; an inline phonetic spelling in parentheses; or a link to an audio clip. Any one of them satisfies the criterion for the specific word it covers.

Why It Matters

Screen readers resolve heteronyms from spelling, not meaning. Given "The lead pipe," a text-to-speech engine picks one phoneme based on its lexicon and ships it to the listener; if the engine guesses wrong, the blind user hears a different word than the sighted reader sees, and the sentence stops parsing.
In Japanese, Chinese, and Korean content, the same character can take multiple readings (for example, the kanji 生 has dozens). Without <ruby> furigana or pinyin annotations, a screen reader or a learner has no way to pick the reading the author intended.
Readers with cognitive and learning disabilities and readers decoding a second language cannot always use surrounding context to back out the intended pronunciation, so the word becomes a comprehension dead end even when sighted fluent readers would recover.
In legal, medical, and educational content, the wrong reading is not just awkward -- "minute" (sixty seconds) versus "minute" (tiny) or "dose" versus "doze" changes the meaning of a clause or an instruction, and the mechanism is the only audit trail back to the author's intent.

Examples

Do: Provide inline pronunciation for ambiguous words

The study tested for the presence of lead/lɛd/ (the metal) in drinking water.

✔ Ruby annotation provides pronunciation to distinguish the homograph

<p>The study tested for the presence of
  <ruby>lead<rt>/l&#x025B;d/</rt></ruby>
  (the metal) in drinking water.</p>
<!-- Ruby annotation shows pronunciation -->

Don't: Ambiguous word with no pronunciation cue

The lead was found to exceed safe limits in the sample.

✘ Is this lead (the metal) or lead (the advantage)? No pronunciation mechanism provided

<!-- FAILS: no way to determine pronunciation -->
<p>The lead was found to exceed safe limits
  in the sample.</p>
<!-- "lead" is ambiguous without context -->

Do: Link to a glossary with pronunciation

The patient was asked to read the consent form before the procedure.

✔ Glossary link with pronunciation in title clarifies the intended meaning

<p>The patient was asked to
  <a href="#glossary-read"
     title="Pronunciation: /ri&#x02D0;d/">
    read
  </a>
  the consent form before the procedure.</p>
<!-- Glossary link provides pronunciation -->

Don't: Technical homograph with no disambiguation

The bass was measured at 40 Hz during the sound check.

✘ Is this bass (low-frequency sound) or bass (the fish)? Readers and screen readers cannot determine pronunciation

<!-- FAILS: no pronunciation mechanism -->
<p>The bass was measured at 40 Hz during
  the sound check.</p>
<!-- "bass" could be the fish or the sound -->

How to Fix It

Find the heteronyms and multi-reading characters in your content. In English prose, scan for words whose pronunciation flips with meaning: lead, read, bass, bow, tear, wind, object, record, refuse, produce, conduct, minute, close, desert. In CJK content, scan for characters with multiple common readings where surrounding text does not pin the reading (proper nouns are the usual offender).
Use <ruby> for CJK content. Wrap the base text in <ruby> and put the reading in <rt>: <ruby>東京<rt>とうきょう</rt></ruby> for Japanese furigana, <ruby>北京<rt>Běijīng</rt></ruby> for pinyin. Screen readers that support ruby surface the reading; visually, the reading sits above or beside the base text per the user agent.
Use inline phonetic spelling or a parenthetical for English heteronyms. The terser fix is a parenthetical gloss: lead (/lɛd/, the metal) or tear (/tɛər/, as in fabric). IPA is the precise notation, but a plain-language rhyme ("rhymes with bed") also satisfies the criterion and is easier for non-linguists to read.
Link critical terms to a pronunciation glossary. When the same ambiguous term recurs across a document -- drug names, legal terms, scientific jargon -- link each instance to a glossary entry that carries the phonetic spelling and, ideally, an audio clip. One glossary, many links, no inline noise in the running text.
Do not rely on aria-label as the pronunciation mechanism. Overriding the accessible name with a phonetic respelling replaces the visible text in the accessibility tree, which breaks 2.5.3 Label in Name for speech-input users and confuses the reading order. Ruby, inline phonetics, and glossary links all leave the visible text intact.

References

[1] W3C (2023). Understanding Success Criterion 3.1.6: Pronunciation. W3C, Accessed 2026-04-07. https://www.w3.org/WAI/WCAG22/Understanding/pronunciation.html ↩