Accessibility/TextImplementation

From MozillaWiki
Jump to: navigation, search

Propose

Provide text interface implementation consistent with keyboard navigation.

Current state

Text of the accessible is exposed as a string having embedded object characters which point to nested accessible. Nested accessible may be a text accessibles. In order to get next character or word the AT should crawl the tree until they get a real character or word. While this approach is suboptimal but the main problem is the text received from tree inspection is not necessary consistent with keyboard navigation in general.

Let's consider two examples:

text<a>link</a>text

and

text<a>a link</a>text

Both are exposed as textXtext where X is embedded object character. Text implementation relying on embedded character approach expose three words. From keyboard navigation point of view these examples have one and two words respectively. If AT doesn't make any assumption what the word can be based on tree inspection then no way to get words in consistent way with keyboard navigation.

Proposal

Unroll recursively embedded characters into one string for every text accessible. HyperLink interface provides an access to direct embedded objects.

For example,

<body>
<p>hello</p><p>text<a>link</a></p>
</body>

has "hellotextlink" text, the container accessible (document accessible in this case) has two embedded objects with offsets [0, 5) and [5, 13). Second paragraph has "textlink" text and have one embedded object [4, 8).

Implementation

Have a tree representation of the text. That means there's no combined text for each level. Accessible text interface encapsulates recursive tree traversal to expose unified text.

Cached text or live rendered text?

For text accessibles we cache their text representation, some accessibles like list bullet gets the text from frame tree. Primary the idea of cached text is to fire text change events because layout doesn't know what was changed and thus we are forced to calculate the text difference on a11y side.

Currently methods to receive whole text and text by characters operate on cached text. Methods to receive text by words and lines operates on frame tree because cached text representation doesn't know what is a word or a line.

Basically cached text and rendered text might be out of sync because of asynchronous processing of rendered text changes. That means all methods should operate on the same text. Ideally that would be cached text because it makes text methods results consistent to text events, for example, you're don't get inserted text before text inserted event is fired. But that doesn't look like a big deal because we have the same for tree insertions when AT can operate on inserted subtree before they were notified about its insertion.

If we are going to operate on cached text then we need to

  1. share words detection logic with layout
  2. cache line breaks (especially it makes sense for soft line breaks which are defined on layout side entirely).

If we are going to operate on live text then we need to make sure it's safe and stable when AT operates with accessible object from own process. What happens if they call into us in the middle of frame tree update.