What this is all about
The article proposes a model of keyboard navigation through the web page. Technically keyboard navigation can be enabled explicitly, by turning on caret navigation mode, or enabled automatically if the focus goes into editable area.
Keyboard navigation has special meaning for the screen reader users since the keyboard is primary tool they use to traverse the page content. A typical web page may contain rich text, structural elements (like HTML tables) and UI elements so that the content the user deals with is very complex. This makes keyboard navigation behavior not trivial.
Unfortunately there is no specification which defines this behavior. Browser implementations vary and none are perfect. This leads to some portions of the web page aren't accessible by keyboard only. For example it might be not possible to focus UI elements placed within text content while navigating through the page. The rules used to define content navigation when floating and absolute positioned content is met along the way are not clear. This situation coupled with current implementation lowers keyboard navigation appeal and inspires screen reader developers to disable browser provided navigation and implement their own version. Great effort is made by these developers because it is necessary to ensure any presented content is accessible by keyboard.
The goal of this document is to provide guidelines as to how the keyboard navigation should work when the user navigates through mixed content on a page.
The web page may include rich text like headings and paragraphs, and structural elements like tables and lists. This content doesn't require anything special in terms of keyboard navigation support; the rules are standard and correspond to plain text.
Form controls (like HTML input or HTML button) and focusable elements (either HTML anchor or user-defined widgets) can be presented. The keyboard navigation rules must be refined for this type of content.
The third type is ARIA widgets, i.e. elements having @role attribute. The ARIA widgets are treated as a separate content type since they need to be dealt specially while user navigates through them.
The fourth type is static elements like HTML image or disabled form controls. Typically, when caret navigation is on then a static element that is put into the navigation sequence doesn't make sense for sighted and screen reader users, and so it could be skipped. However when the user navigates through editable areas where the user should have full control the static elements should be in navigation sequence.
The last type is generated content that is the content created by :before and :after pseudo styles.
Everything should be achievable
The primary idea is to put rich elements into the navigation order together with the rich text so the user is able to achieve anything on the page via the keyboard. This also applies to form controls and generated content (e.g. content for :after and :before pseudo styles). The user should be able to select any visible content and copy it into the clipboard.
ARIA is about AT support like screen readers, voice recognition technology, alternate input devices, magnifiers, etc. who have hysteresis for determining how to present a web page, or any application, to the user. In general ARIA doesn't change the content behavior and this way it doesn't affect on users who don't use an AT.
On the other hand, ARIA always comes with custom widget development. It's always hard work to create the custom widget and make it behave correctly in the web page context without the browser help. For example, if the div was used to create a button or link then it's a div from the browser point of view. However, these widgets assume different keyboard navigation behavior. It would be really nice if the browser uses the ARIA as the widget classifier to fit the behavior depending on the widget type.
This proposal doesn't suggest the browser to make any assumptions about widget JS keyboard operation behavior based on ARIA role since this is out of scope this proposal and might be considered as irreconcilable with custom widgets development process. The same time it's suggested to rely on ARIA role for helping page traversal patterns.
Along with the idea to put everything into navigation order, the user should be able to skip uninteresting content. To make this happen the user chooses the type of page traversal. For example, navigation by words is targeted to skip rich elements as long as they are not considered as words, while navigation by characters allows the user to traverse everything.
Navigation order is defined within navigation block - a logical union of navigable content. So if the caret is inside of a navigation block then the user should navigate the block entirely before the caret is moved to the next navigation block in the layout.
A basic navigation lexical unit, defines a minimal step the user can perform when he navigates through the content, used to build higher level navigation lexical units.
- Empty character
- Integral element
- In-text element
A high level lexical unit, presents a logical union of elements grouped by the same navigation flow, defines a minimal step when user navigates by blocks.
- Rich element
- Rich text
A text or text containers containing rich elements or styled text.
A lexical unit, presented by words and delimiter characters.
- Static element
- Structural element
An element used to arrange the web content into visual and logical structure like HTML lists or tables.
A navigation lexical unit, constructed from sequence of characters and do not contain delimiter characters. Defines a minimal step the user can perform when he navigates through the content by words.
[gaidukov: Is it necessary to add definitions for terms 'text element' and 'non text element'?]
[gaidukov: It would be nice to add simple example for each definition]
Rich element terminology
The rich element term refers to an element placed into the navigation sequence that is different from the text, text containers (like headings and paragraphs) and structural elements.
In short, rich elements can be classified by the following groups:
- in-text elements, i.e. elements behaving mostly as a part of the surrounding text
- native focusable elements (e.g. HTML:a)
- user defined focusable elements (e.g. HTML:div with tabindex="0")
- ARIA widgets like @role="link"
- integral elements, i.e. elements having different navigation rules than rich text
- elements containing navigable content
- form controls like HTML input
- ARIA widgets like @role="textbox"
- elements not containing navigable content
- form controls like HTML button or HTML select
- ARIA widgets like @role="button"
- disabled form controls and HTML image within the editable area
- elements containing navigable content
- static elements
- disabled form controls and HTML image
Disabled form controls and HTML images are treated differently depending on whether these elements are inside the editable area or not. The term static elements refers to these elements if they are not currently part of editable content.
Navigation order should be defined by navigation blocks. The user should navigate the block entirely before the caret moves to the next block. For example if two blocks are visually placed next to each other and the user reaches the end of line of the first block then the caret should be moved to the first line of the next block.
The navigation blocks can be defined by
- layout flow controlled by CSS position or float properties;
- section elements like nav or footer elements;
- editable areas like HTML input element or div@contentEditable="true".
If section elements aren't used on web page then all content of the normal flow is contained by one navigation block that may contain nested blocks. In contrast to normal flow content the float content or the absolute positioned content are presented by its own blocks, i.e. each container (where the flow is changed) is represented by its own navigation block.
The navigation blocks can be nested. For example, if the content contains an editable area then the area is presented by its own block which is contained by the containing parent block.
The next block is defined by the page layout. If two blocks are overlapped then the "more closer" block is used and then the "more far" is used. If two blocks occupy the same place then z-index is used, the bottom content is excluded from navigation order.
Navigation blocks are organized in a tree structure. The tab navigation order is defined within the navigation block until the navigation order is changed via changing the tabindex attribute value.
Rich element as a lexical unit
A rich element should be treated differently than rich text when the user navigates through a navigable area. The primary difference is it should be allowed to put the caret immediately before or after the rich element. This makes it possible to select only the rich element as it is not immediately surrounded by the rich text. This requirement has special meaning for editable areas where the user should be able to write text before or after the rich element.
[gaidukov: term 'navigable area' is not defined]
To make this happen special autogenerated empty character are inserted before and after the rich element. If the rich elements are placed one after another then each of them has empty character embedded between them, i.e. the elements don't share empty characters. For instance, if there are two rich elements next to each other then there will be (empty)(rich)(empty)(empty)(rich)(empty).
Empty characters are used to designate the element boundaries when the element is treated as lexical unit, however their behavior differs from the word delimiters; like 'space' characters.
The empty characters are not presented visually but they affect keyboard navigation. They do not affect on DOM but they are stored in browser rendering model instead. Also they are exposed to AT.
Integral element is a word
[gaidukov: I guess it is depends on focus. If focus is outside of integral element this it is a word. If focus is inside integral element then it's sentence in common case]
If the element doesn't contain any navigable text then its word doesn't have any characters and is referred to as an empty word, otherwise its word consists of all contained characters and it's referred to as a complex word.
For example, HTML button is an empty word since it doesn't contain navigable text. HTML input is a complex word since it allows navigable content. Another example is a non editable container element within the editable area, which is treated as an empty word if caret navigation mode is turned off.
The integral element is surrounded by empty characters.
In-text element as a sentence
Since an in-text element behaves as a part of the rich text then it should be treated as a sentence that consists of all the words of the contained text. At the same time it should be possible to set the caret before/after the element. To meet this requirement the empty characters are appended before the first word and after the last word of the sentence to designate the sentence boundaries like what happens for rich words. The term rich sentence will be used to designate it is for the in-text element.
Static element as a character
The caret position and selection terms
If the caret position is immediately before/after the start/end empty character of the rich element then the caret is immediately before/after the element.
If the caret position is between empty characters of the sibling rich elements (i.e. between the end and start empty characters) then the caret is between the elements.
If the caret position is between empty characters of the integral element represented by an empty word then the caret is on the element. If the rich element has navigable text then the caret is inside the element.
If the selection contains both empty characters of the rich element then the element is selected entirely.
The caret visual position
Since a rich word or sentence is wrapped by special delimiters then the cursor for the same caret position can referred to in two ways: "the caret is immediately before the rich element" and "the caret is immediately after the element preceding the rich element".
If the rich element and the element preceding the rich element are placed visually on different lines then the cursor might be rendered logically in two different locations. The following rules are applied.
- if the rich element is next to or preceding the text container then the caret is drawn after/before the text.
- if two rich elements are nested then the caret is drawn after the first rich element.
Mapping to AT
The rich element accessible as a part of text container accessible is an embed character. All characters of the reach word or sentence including its empty characters are contained in embed character. The empty character should be exposed to AT as a certain character so that AT can get correct text offsets and announce them while the user navigates through web page. For example, let's have 'text<button>btn</button>text' then screen reader can read like 't', 'e', 'x', 't', 'before button', 't', 'e', 'x', 't'.
The same time it should be an unpronouncable character so that the text containing empty characters can be read without special support.
Keyboard interaction with the rich element
The rich element behavior on the keyboard input is the same as usual until the global keyboard navigation rules conflicts with the element behavior. In this case the element should explicitly prevent the global action, i.e. if the element prepares certain actions on the pressed key (e.g. HTML select element changes the selected option on up/down keys) then it cancels the default event action.
If the in-text element is driven by global navigation rules (e.g. HTML:a or ARIA controls) then nothing special should be done. However if the ARIA control handles and process keyboard events then it should take care to prevent default action explicitly.
For example if the text field processes keyboard events to implement the caret navigation then it should prevent the default action if the caret can't be moved within the text field. So that if the caret achieves the end of text field then the global keyboard navigation is applied and the caret is moved out from the text field to next keyboard navigable content.
The following conditions are considered under the assumption the user navigates forward by words (ctrl or alt (option) + left/right arrow key depending on platform and text direction). The rules defined below are inverse if user moves backward.
When processing integral element:
- if the next word is a word of integral element (i.e. the caret is on word preceding the element word) the caret should be set before the word;
- if the caret is before the word then the element should be skipped and the caret should be set before the next word;
- if the caret is on the word then the user should traverse the text of the word entirely (in the case when integral element is presented by complex word) and then the caret should be set before the next word after the word.
If the caret is on the focusable element word and goes outside it then the rich text container of the element should be focused. For example, if the HTML button is focused (which means the caret is on the element) then the caret should be set before the word following the button.
When processing in-text element:
- if the next word is the first word of a sentence of in-text element the caret should be placed directly before the element;
- if the caret is placed directly before the first word of the sentence, then a move to the next word would place the caret directly in front of the second word of the sentence;
- if the caret is placed directly before the first word of the sentence, and the sentence contains only one word, then a move to the next word would result in placing the caret directly in front of the word following the sentence.
So that the in-text element is processed as a normal text.
If the caret is in the middle of the rich sentence then the user should navigate the whole sentence by words. For example, if the caret is inside of the anchor then the caret should be moved by words through the anchor text.
If the caret is inside the focusable in-text element and the caret goes out of the element then the area where the element is placed should be focused. If the caret is outside the focusable in-text element and the caret goes into it then the element should be focused.
If the user navigates by characters (left/right arrow keys) and the rich element is encountered on the way then
- if the element is focusable then it should be focused
- if the element is an in-text element or contains navigable text then the caret should be set before the first character of the element's content
- otherwise the caret should be set on the element, i.e. placed between the empty characters.
Visually the caret set on the non focusable element might look like dashed blue border around the element. For example, when the caret is on the HTML button then the button is focused, when the caret is on disabled HTML input (which doesn't contain navigable text) within the editable area then the border should appear around it to show the element is current.
If the integral element is focused and the user navigates by characters then
- text container should be focused and caret position should be set after the element if pressed key isn't processed by the element
- otherwise the element's action should be performed.
If the in-text element is focused and the user navigates by characters then caret should be moved consecutively to the end of the element's content, then should be set after the element and the text container should take the focus.
When the user navigates up/down the lines (up/down arrow keys) then the caret should be placed before the character visually placed above/below the starting visual caret position (anchor position).
If an integral element is placed above/below the anchor position then the caret should be set between empty characters of the element, i.e. integral element should be focused or selected.
If a static element is placed above/below the anchor position then the caret should be set before or after the element depending on what the element edge is closer to the anchor position.
If the caret is on the element (between its empty characters) then the caret should be placed before the element that is closest to the start edge of the element.
If the integral element is focused and it has own processing of up/down arrow keys then up/down navigation is not permitted. In this case up/down arrow + ctrl (option) keys should be used instead.
When the user presses home/end keys then the caret should be moved to the begin/end of the current line. Note, the line definition defined within structural elements, therefore, for instance, when structural elements are nested (like nested HTML tables) then the line is the row of bottom table where the caret is.
When the user presses home/end + ctrl (option) keys then the caret should be moved to the begin/end of the navigation block.
[surkov: I find handy if repeat home/end(ctrl) would move the caret by lines/blocks. Any way now repeat presses do nothing and we don't have an ability to move by lines/blocks]
If the element is focused then pressing tab should navigate to the next focusable element in tab order. Next focusable element in tab order can be encountered either inside or outside of the navigation block the element belongs to. This requirement is applied to integral elements and in-text element both.
Managing the selection
When an integral element participates in selection then it should be selected entirely (atomically) always. For example, if the user starts the selection before the integral element having navigable text then the navigable text is not a subject of selection.
If the integral element has own selection behavior (like an HTML input) while it's focused and the user starts the selection inside of it then there is no way to extend the selection out of boundaries of the element.
The in-text element with its content participates in selection as a part of surrounding content.
Visually, an entirely selected element might have blue background.
When user holds selection modifier key (for example, shift key) and moves through the text then the text container stays focused.
At the same time if the integral element encountered on the way then the element is entirely appended to the selection.
If an in-text element is encountered on the way then then the element's content is appended consecutively to the selection until the user reaches the end of the element. The focus is on in-text element while the caret is inside of the element. If the selection is started inside of in-text element while is is focused then the text container will take the focus once the caret leaves the in-text element content.
If the rich element is selected entirely then it can be copied to the clipboard. Both its text representation and the element itself should be copied to the clipboard as different mime types.
If the in-text element is selected partially then the selected text is copied to clipboard. The element itself is not a subject of copy.
If the the rich element is focused then clipboard operation works as usual.
If the rich element is pasted from clipboard into editable area then it is pasted as an element.