This is a draft document describing what is in current desktop and mobile browsers, and how a possible accessible browsing solution should compare with that from our side.
Browsers both on the desktop and on mobile platforms have adopted certain commonalities in navigation and presentation of web pages. All desktop screen readers for the blind present a web page in a linear fashion, decolumnized layout-wise. The order of presentation is the order in which elements are present in the HTML source (and derived DOM and Accessibility trees). On Windows, this is done in what screen readers commonly refer to as virtual buffers. On the GNOME Desktop and on the Mac, the DOM and/or accessible trees are traversed in a similar fashion.
In all these instances, users can use sequential navigation like down-arrowing through a page to get to all information in the order specified in the HTML. Semantic information such as links, headings, form fields, graphics, landmarks are given along with the actual text via text-to-speech.
On iOS devices, this linear navigation is also possible. However, due to the 2d nature of the touch screen, users actually have more freedom to navigate, since they can touch a web page virtually anywhere, and therefore even get a feel for the layout of the page as specified by the CSS. While the one-finger sweep right and left VoiceOver gestures will still traverse like specified above, tpouching and dragging will actually give the element that's currently under the finger, regardless of DOM order. Especially on pages where the users know their way around, this can signifficantly speed up navigation.
All desktop screen readers, Symbian browsing implementations and also the iOS devices implement a means of quickly navigating to certain elements. This is done by single-letter navigation, or, in the case of the iOS devices, through the VoiceOver rotor, which can be set to jump to certain types of elements with a one-finger sweep up and down. Common elements these features navigate to are:
- links, sometimes also specifically visited and not visited links
- form fields, either generally, or in some desktop cases also specific ones like the next textbox, next combobox, radio button, checkbox, button. textboxes and textareas are not distinguished, they're being treated as one unit.
- landmarks, both WAI-ARIA and HTML5.
- list items
- block quotes
So, regardless of where one is with the current navigation, jumping to the next or previous heading is done with just one keystroke (or one sweep of the finger). This skips maybe 10, maybe a hundred links, it doesn't matter, it simply goes to the next heading. Especially on pages with good HTML structure, or on pages that are known by the user, this tremendously speeds up navigation.
The forms mode concept
On Windows, witht he virtual buffers implemented for performance reasons for each screen reader, these implement something like forms mode or focus mode. In that mode, the user interacts directly with the browser, whereas in virtual mode, they interact with their screen reader, and that, in turn, communicates to the browser whether it should refocus, scroll, etc., as needed.
On GNOME, Mac OS and iOS, this is not the case. When in a form field that can be interacted with (for example typing into a text field), the quick navigation is automatically turned off and the letters handled properly.
Symbian uses a mixed approach that closely resembles the Windows screen reader approach, but not quite the same, since they even put up dialogs to enter text, and then reinsert that text into the browser themselves.
On iOS, activating a form field is seamless, the keyboard pops up and one can start typing.
For accessible mobile Firefox
If we go the route of implementing our own voicing solution, we need:
- Means of exploration, at least sequentially, maybe even spacial, as on iOS devices
- Means of quick navigation. This is absolutely necessary for efficient use of web apps on mobile devices!
- a seamless interaction model with forms and outside of form elements. To the user, this should be completely transparent, we should always know where the user is and what paradigm to use if keys are pressed etc.