Summary

What this is all about

The article proposes a model of keyboard navigation through the web page. Technically keyboard navigation can be enabled explicitly, by turning on caret navigation mode, or enabled automatically if the focus goes into editable area.

Keyboard navigation has special meaning for the screen reader users since the keyboard is primary tool they use to traverse the page content. A typical web page may contain rich text, structural elements (like HTML tables) and UI elements so that the content the user deals with is very complex. This makes keyboard navigation behavior not trivial.

Unfortunately there is no specification which defines this behavior. Browser implementations vary and none are perfect. This leads to some portions of the web page aren't accessible by keyboard only. For example it might be not possible to focus UI elements placed within text content while navigating through the page. The rules used to define content navigation when floating and absolute positioned content is met along the way are not clear. This situation coupled with current implementation lowers keyboard navigation appeal and inspires screen reader developers to disable browser provided navigation and implement their own version. Great effort is made by these developers because it is necessary to ensure any presented content is accessible by keyboard.

The goal of this document is to provide guidelines as to how the keyboard navigation should work when the user navigates through mixed content on a page.

Content classification

The web page may include rich text like headings and paragraphs, and structural elements like tables and lists. This content doesn't require anything special in terms of keyboard navigation support; the rules are standard and correspond to plain text.

Form controls (like HTML input or HTML button) and focusable elements (either HTML anchor or user-defined widgets) can be presented. The keyboard navigation rules must be refined for this type of content.

The third type is ARIA widgets, i.e. elements having @role attribute. The ARIA widgets are treated as a separate content type since they need to be dealt specially while user navigates through them.

The other type is static elements like HTML image or disabled form controls. Typically, when caret navigation is on then a static element that is put into the navigation sequence doesn't make sense for sighted and screen reader users, and so it could be skipped. However when the user navigates through editable areas where the user should have full control the static elements should be in navigation sequence.

The last type is generated content that is the content created by :before and :after pseudo styles.

Overview

Everything should be achievable

The primary idea is to put rich elements into the navigation order together with the rich text so the user is able to achieve anything on the page via the keyboard. This also applies to form controls and generated content (e.g. content for :after and :before pseudo styles). The user should be able to select any visible content and copy it into the clipboard.

ARIA widgets

ARIA is all about screen readers support and it doesn't change the content behavior, i.e. doesn't affect on sighted users. On the another hand ARIA always comes with custom widget development. It's always hard work to create the custom widget and make behave it correct in the web page context without the browser help. For example, if the div was used to create a button or link then it's a div from the browser point of view. However these widgets assume different keyboard navigation behavior. It would be really nice if the browser uses the ARIA as the widget classifier to fit the behavior depending on the widget type.

[davidb: I disagree with this; I don't think we should make any assumptions about keyboard behaviour based on @role. I do think we could use @role for helping page traversal patterns]

Control the navigation sequence

Along with idea to put everything into navigation order the user should be able to skip uninteresting content. To make this happen the user chooses the type of page traversal. For example, navigation by words is targeted to skip rich elements as long as they are not considered as words, while navigation by characters allows the user to traverse everything.

Navigation order

Navigation order is defined within navigation blocks - a logical union of navigable content. So if the caret is inside of a navigation block then the user should navigate the block entirely before the caret is moved to the next navigation block in the layout. An example of a navigation block is an editable area - where the user should traverse it before the caret moves to the next block. Navigation blocks can be nested.

Details

Terminology

The rich element term refers to the element placed into navigation sequence that is different from the text, text containers (like headings and paragraphs) and structural elements.

In short rich elements can be classified by the following groups.

in-text elements, i.e. elements behaving mostly as a part of the surrounding text
- native focusable elements (e.g. HTML:a)
- user defined focusable elements (e.g. HTML:div with tabindex="0")
- ARIA widgets like @role="link"
integral elements, i.e. elements having different navigation rules than rich text
- elements containing navigable content
  - form controls like HTML input
  - ARIA widgets like @role="textbox"
- elements not containing navigable content
  - form controls like HTML button or HTML select
  - ARIA widgets like @role="button"
  - disabled form controls and HTML image within the editable area
static elements
- disabled form controls and HTML image

Disabled form controls and HTML image are treated differently depending on whether these elements are inside the editable area or not. The term static elements refers to these elements if they are not currently part of editable content.

Navigation blocks

Navigation order should be defined by navigation blocks. The user should navigate the block entirely before the caret moves to the next block. For example if two blocks are visually placed next to each other and the user reaches the end of line of the first block then caret should be moved to the first line of the next block.

The navigation blocks are defined by layout flow. All content of the normal flow is contained by one navigation block what may contain nested blocks. In contrast to normal flow content the float content or the absolute positioned content are presented by its own blocks, i.e. each container (where the flow is changed) is represented by its own navigation block.

The navigation blocks can be nested. For example, if the content contains an editable area then the area is presented by its own block which is contained by the containing parent block.

The next block is defined by the page layout. If two blocks are overlapped then the "more closer" block is used and then the "more far" is used. If two blocks occupy the same place then z-index is used, the bottom content is excluded from navigation order.

Navigation blocks are organized in a tree structure. The tab navigation order is defined within the navigation block until the navigation order is changed via changing the tabindex attribute value.

For example

  <body>
    <p>normal multiline paragraph</p>
    <p style="float: right; color: blue;">floating multiline paragraph</p>
  </body>

which visually can be presented as

normal multiline	floating multiline
paragraph	paragraph

If the user navigates through a normal paragraph by characters then it should be navigated entirely, the caret should be moved to the floating paragraph when the user reaches the end of the normal paragraph. Note, the first line of the floating paragraph ("floating multiline") is visually "next" to the first line of the normal paragraph ("normal multiline").

Rich element as a lexical unit

A rich element should be treated differently than rich text when user navigates through navigable area. The primary idea is to put the caret immediately before or after the rich element. This makes it possible to select only the rich element as it is not immediately surrounded by the rich text. This requirement has special meaning for editable areas where the user should be able to write text before or after the rich element.

[davib: I'm not sure my edits capture your intended meaning here ^]

To make this happen special autogenerated empty characters are inserted before and after the rich element. If the rich elements are placed one after another then each of them has empty character embedded between them, i.e. the elements don't share empty characters.

The empty characters are not presented visually but they affect keyboard navigation. They are used to designate the element boundaries when the element is treated as lexical unit, however their behavior differs from the word delimiters; like 'space' characters.

Integral element is a word

The integral element should be treated as a word. The term rich word will be used to emphasize the word is a rich element.

If the element doesn't contain any navigable text then its word doesn't have any characters and is referred to as an empty word, otherwise its word consists of all contained characters and it's referred to as a complex word.

For example, HTML button is an empty word since it doesn't contain navigable text. HTML input is a complex word since it allows navigable content. Another example is a non editable container element within the editable area, which is treated as an empty word if caret navigation mode is turned off.

The integral element is surrounded by empty characters.

In the following example

text<button>btn</button><input value="value">text

conditional notation can be presented as "text|||value|text", where the empty character is marked by '|' symbol. Both the empty word for the button and the complex word for the input are wrapped by empty characters ('|' symbols of blue and red colors correspondingly).

In-text element as a sentence

Since an in-text element behaves as a part of the rich text then it should be treated as a sentence that consists of all the words of the contained text. At the same time it should be possible to set the caret before/after the element. To meet this requirement the empty characters are appended before the first word and after the last word of the sentence to designate the sentence boundaries like what happens for rich words. The term rich sentence will be used to designate it is for the in-text element.

For example, the following HTML anchor is treated as a rich sentence consisting of one word

  Click <a>here</a> to see news

Conditional notation can be written as "Click |here| to see news".

Another example of the rich sentence is a non editable container element within the editable area when caret navigation mode is on.

Static element as a character

The static element is treated as a single character when the user navigates through the web page.

The caret position and selection terms

If the caret position is immediately before/after the start/end empty character of the rich element then the caret is immediately before/after the element.

If the caret position is between empty characters of the sibling rich elements (i.e. between the end and start empty characters) then the caret is between the elements.

If the caret position is between empty characters of the integral element represented by empty word then the caret is on the element. If the rich element has navigable text then the caret is inside the element.

If the selection contains both empty characters of the rich element then the element is selected entirely.

The caret visual position

Since a rich word or sentence is wrapped by special delimiters then the cursor for the same caret position can referred to in two ways: "the caret is immediately before the rich element" and "the caret is immediately after the element preceding the rich element".

If the rich element and the element preceding the rich element are placed visually on different lines then the cursor might be rendered logically in two different locations. The following rules are applied.

if the rich element is next to or preceding the text container then the caret is drawn after/before the text.
if two rich elements are nested then the caret is drawn after the first rich element.

For example,

  hello
  <div role="button" tabindex="0">button1</div>
  <div role="button" tabindex="0">button2</div>

then the cursor is drawn after the 'o' if the caret is before the first ARIA button. If the caret is before the second ARIA button then the cursor is drawn after the first ARIA button.

Mapping to AT

The rich element accessible as a part of text container accessible is an embed character. All characters of the reach word or sentence including its empty characters are contained in embed character. The empty character should be exposed to AT as a certain character. This character should be not pronounceable character so that AT might not need any additional special support.

Keyboard interaction with the rich element

The rich element behavior on the keyboard input is the same as usual until the global keyboard navigation rules conflicts with the element behavior. In this case the element should explicitly prevent the global action, i.e. if the element prepares certain actions on the pressed key (e.g. HTML select element change the selected option on up/down keys) then it cancels the default event action.

If the in-text element is driven by global navigation rules (e.g. HTML:a or ARIA controls) then nothing special should be done. However if the ARIA control handles and process keyboard events then it should care to prevent default action explicitly.

For example if the text field processes keyboard events to implement the caret navigation then it should prevent the default action if the caret can't be moved within the text field. So that if the caret achieves the end of text field then the global keyboard navigation are applied and the caret is moved out from the text field to next keyboard navigable content.

Keyboard navigation

Navigation by words

The following conditions are considered under assumption the user navigates forward by words (ctrl or alt (option) + left/right arrow key depending on platform and text direction). The rules defined below are inverse if user moves backward.

Integral element

If the caret position is somewhere in the middle of the rich text and the integral element is next on the way (i.e. its rich word is the next word) then the caret should be set before the rich element. If the caret is before the integral element then the element should be skipped and the caret should be set before the next word.

For example,

Enter <input value="number"> in pixels.

If the initial caret position is before 'Enter' word then the caret should be moved before the input element, then before the 'in' word and then before the 'pixels' word.

If the caret is on the integral element presented by empty word then the caret should be set before the next word. The rich text container where the element lives should be focused. For example, if the HTML button is focused (it means the caret is on the element) then the caret should be set before the word following the button.

If the caret is inside of navigable text of the integral element then the user should traverse the navigable text entirely and then the text container where the element lives should be focused and the caret should be set before the next word after the element's word.

In-text element

If the in-text element is next on the way (i.e. the first word of its sentence is next word) then the caret should be set before the element. If the caret is before the element then it should be moved before the begin of the second word of the sentence. If the the sentence consist of one word then the caret should be moved before the word following the sentence.

For example,

See <a>this report</a> for more details.

If the initial caret position is before the 'See' word then caret should be set before the empty character of the 'this' word, before 'report' word and then before the 'for' word.

So that the in-text element is processed as a normal text.

If the caret is in the middle of the rich sentence then the user should navigate the whole sentence by words. For example, if the caret is inside of the anchor then the caret should be moved by words through the anchor text.

If the caret is inside the focusable in-text element and the caret goes out of the element then the area where the element is placed should be focused.

Navigation by characters

If the user navigates by characters (left/right arrow keys) and the rich element is encountered on the way then

if the element is focusable then it should be focused

and

if the element is in-text element or contains navigable text then caret should be set before the first character of the element's content
otherwise the caret should be set on the element, i.e. placed between the empty characters.

Visually the caret set on the non focusable element might look like dashed blue border around the element. For example, when the caret is on the HTML button then the button is focused, when the caret is on disabled HTML input (which doesn't contain navigable text) within the editable area then the border should appear around it to show the element is current.

If the integral element is focused and the user navigates by characters then

text container should be focused and caret position should be set after the element if pressed key isn't processed by the element
otherwise the element's action should be performed.

For example if HTML button and HTML image are placed after another within editable area

text<button>btn1</button><img src="">

then right arrow key presses should traverse "text" by characters, then focus the button, focus the text container and set caret position between button and image, then make current the image, then focus the text container and set caret position after the image.

If the in-text element is focused and the user navigates by characters then caret should be moved consecutively to the end of the element's content, then should be set after the element and the text container should take the focus.

For example, if the HTML anchor and HTML button are placed after another

<a href="">link</a><button>btn</button>

then caret should be moved through the "link" text, should be set after anchor element and then button should be focused.

Up/down navigation

When the user navigates up/down the lines (up/down arrow keys) then the caret should be placed before the character visually placed above/bellow relative the visual caret position (anchor position).

If the integral element is placed above/below the anchor position then the caret should be set between empty characters of the element, i.e. integral element should be focused or selected.

If the static element is placed above/below the anchor position then the caret should be set before or after the element depending on what the element edge is closer to the anchor position.

If the caret is on the element (between its empty characters) then the caret should be placed before the element that is closest to the start edge of the element.

If the integral element is focused and it has own processing of up/down arrow keys then up/down navigation is not permitted. In this case up/down arrow + ctrl (option) keys should be used instead.

For example,

<button>click</button><br>
click

If the caret position is between 'cl' and 'ick' substrings and user press the up arrow key then the button should be focused. Then if the user press down arrow key then the caret should be moved before the 'click' word.

Home/end navigation

When the user presses home/end keys then the caret should be moved to the begin/end of the current line.

When the user presses home/end + ctrl (option) keys then the caret should be moved to the begin/end of the navigation block.

Tab navigation

If the element is focused then pressing tab should navigate to the next focusable element in tab order. Next focusable element in tab order can be encountered either inside or outside of the navigation block the element belongs to. This requirement is applied to integral elements and in-text element both.

For example, if editable area contains two buttons (HTML and ARIA buttons) and there is one button outside of editable area

<div contentEditable="true">
  Text<button>btn1</button>text<div role="button" tabindex="0">btn2</button>text
</div>
<button>btn3</button>

and the 1st button is focused then pressing tab should move the focus to the 2nd button and then to the 3rd button.

Managing the selection

When integral element participates in selection then it should be selected entirely always. For example, if the user starts the selection before the integral element having navigable text then the navigable text is not a subject of selection.

If the integral element has own selection behavior (like a HTML input) while it's focused and the user starts the selection inside of it then there is no way to extend the selection out of boundaries of the element.

The in-text element with its content participate in selection as a part of surrounding content.

Visually the selected entirely element might have blue background.

Keyboard selection

When user holds selection modifier key (for example, shift key) and moves through the text then the text container stays focused.

At the same time if the integral element encountered on the way then the element is appended to the selection entirely.

If the in-text element is encountered on the way then then the element's content is appended consecutively to the selection until the user reaches the end of the element. The focus is on in-text element while the caret is inside of the element. If the selection is started inside of in-text element while is is focused then the text container will take the focus once the caret leaves the in-text element content.

Clipboard operations

If the rich element is selected entirely then it can be copied into clipboard. Both its text representation and the element itself should be copied into clipboard as different mime types.

If the in-text element is selected partially then the selected text is copied into clipboard. The element itself is not a subject of copy.

If the the rich element is focused then clipboard operation works as usual.

If the rich element is pasted from clipboard into editable area then it is pasted as an element.

Accessibility/RichContentKeyboardBehaviour

Contents

Summary

What this is all about

Content classification

Overview

Everything should be achievable

ARIA widgets

Control the navigation sequence

Navigation order

Details

Terminology

Navigation blocks

Rich element as a lexical unit

Integral element is a word

In-text element as a sentence

Static element as a character

The caret position and selection terms

The caret visual position

Mapping to AT

Keyboard interaction with the rich element

Keyboard navigation

Navigation by words

Navigation by characters

Up/down navigation

Home/end navigation

Tab navigation

Managing the selection

Keyboard selection

Clipboard operations

Navigation menu

Accessibility/RichContentKeyboardBehaviour

Summary

What this is all about

Content classification

Overview

Everything should be achievable

ARIA widgets

Control the navigation sequence

Navigation order

Details

Terminology

Navigation blocks

Rich element as a lexical unit

Integral element is a word

In-text element as a sentence

Static element as a character

The caret position and selection terms

The caret visual position

Mapping to AT

Keyboard interaction with the rich element

Keyboard navigation

Navigation by words

Navigation by characters

Up/down navigation

Home/end navigation

Tab navigation

Managing the selection

Keyboard selection

Clipboard operations

Navigation menu

Search