PDF Tagged Structure

For reference only -- not part of a11ybot's automated checks.

What It Is

A PDF's tag tree is a parallel logical structure layered on top of the page's rendered content stream. The content stream is a list of positioned glyphs -- character codes drawn at x,y coordinates on a page -- and carries no semantics: no headings, no paragraphs, no list items, no table cells, no reading order. The tag tree maps those glyphs to semantic elements: a /Document root containing /H1, /H2, ..., /P, /L, /LI, /Table, /TR, /TH, /TD, /Figure, and /Caption nodes that each point back at the glyph runs they describe. Screen readers walk the tag tree, not the rendered glyphs. PDF/UA (ISO 14289-1), built on the PDF 1.7 / ISO 32000-1 object model, specifies what the tag tree must contain for a document to be considered accessible[1].

Why It Matters

Without a tag tree, a PDF is, to a screen reader, a bag of positioned glyphs. Visual formatting does not substitute for structure: 18-point bold text with extra leading is a heading to a sighted reader and an anonymous run of glyphs to assistive technology; a list typed with hyphens has no /L or /LI nodes; a two-column layout has no defined reading order. The failure mode is mechanical -- PDFs produced by print drivers, image-only OCR, or "export to PDF" paths that do not preserve structure emit a glyph stream with no tag tree at all. Acrobat's "Read Out Loud" and most screen readers fall back to a top-to-bottom positional sweep and linearise multi-column layouts, sidebars, running headers, and footnotes into the main prose. Tagged structure is the load-bearing fix for PDF accessibility: every other PDF remediation -- reading order, table headers, form field labels, image alternative text -- depends on a tag tree existing first.

How It Relates to WCAG

A conformant PDF tag tree is the mechanism that satisfies WCAG 1.3.1 Info and Relationships for PDF documents: the information and relationships conveyed visually must be programmatically determinable, and in a PDF the only programmatic surface for structural relationships is the tag tree[2]. EN 301 549 clause 10 applies WCAG to non-web documents and maps 10.1.3.1 directly onto 1.3.1, requiring that non-web documents satisfy the criterion[3][4]. PDF/UA (ISO 14289-1) prescribes the concrete tag set and structural rules -- which tags are required, how they nest, how artifacts are marked, how alternative text attaches to figures -- that a PDF must follow to meet that criterion[1].

Practical Implications

  • Author in tools that emit a tag tree on export: Word with paragraph styles mapped to heading levels, InDesign with tagged PDF export and an articles panel defining reading order, LaTeX with the accessibility package, or headless HTML-to-PDF pipelines that translate semantic HTML into PDF structure elements.
  • Use heading tags (/H1-/H6) that reflect the document outline. Do not use a larger font as a substitute for a heading tag.
  • Tag lists as /L containing /LI (with /Lbl and /LBody children where the list marker is distinct from the item text). Typed hyphens and bullets produce no list semantics.
  • Tag figures with /Figure plus an /Alt attribute. Mark decorative rules, page numbers, headers, and footers as /Artifact so screen readers skip them instead of reading them as prose.
  • After export, open the PDF in Acrobat Pro and run the accessibility checker. Fix reading order, tag presence, and alternative text findings in the Tags pane and the Reading Order tool -- not by re-exporting with different visual settings.
  • Target PDF/UA conformance explicitly. Validate with a PDF/UA-aware checker that reports structural conformance rather than tag presence alone.

Related Clauses

Sources

  • ETSI EN 301 549 v3.2.1, clause 10 (non-web documents)
  • Accessibility Standards Canada, EN 301 549 clause 10 plain-language guidance
  • ISO 14289-1 (PDF/UA-1)
  • ISO 32000-1 (PDF 1.7)
  • W3C Understanding WCAG 2.2: 1.3.1 Info and Relationships