Accessibility/CacheTheWorld: Difference between revisions
(Fix backlog query so it doesn't include resolved bugs.) |
(Rename "About" section to "What?". Add "Why?" section. Add Bugzilla sections for m2 and m3.) |
||
| Line 1: | Line 1: | ||
== | == What? == | ||
Firefox's current architecture for multi-process accessibility suffers from severe performance issues and is costly and difficult to maintain due to the massively different and specialised approaches necessary on different operating systems. In addition, it is currently impossible to support builtin Windows accessibility tools such as Narrator and Windows Speech Recognition. This project aims to re-architect our multi-process accessibility support to cache the entire accessibility trees for all content processes within the parent process. | Firefox's current architecture for multi-process accessibility suffers from severe performance issues and is costly and difficult to maintain due to the massively different and specialised approaches necessary on different operating systems. In addition, it is currently impossible to support builtin Windows accessibility tools such as Narrator and Windows Speech Recognition. This project aims to re-architect our multi-process accessibility support to cache the entire accessibility trees for all content processes within the parent process. | ||
== Why? == | |||
This will allow us to address several problems that are difficult or impossible to fix with the current architecture: | |||
# Performance, especially on Windows. While performance is acceptable for daily usage in most cases, there are many use cases which are far from delightful and some which are still completely unusable. Performance with the JAWS screen reader, which is used heavily in enterprise, is sluggish at best. A great deal of work has been done over the past few years to improve this, but we're approaching a point where we will not be able to improve this any further with the current architecture. Because software other than assistive technology uses accessibility APIs (e.g. Windows touch, East Asian input methods, enterprise SSO tools), this can impact even users without disabilities. The proposed new architecture will allow us to significantly improve performance across all operating systems. See [https://bugzilla.mozilla.org/show_bug.cgi?id=1737192 bug 1737192] for a list of performance bugs which we expect will be fixed (or at least improved) by Cache the World. | |||
# Stability. Because the current architecture makes heavy use of synchronous IPC, there is a high risk of deadlocks between accessibility and other Firefox components. While all known cases have been addressed as they have been discovered, the underlying cause remains and future instances of this problem are very likely. In addition, our use of obscure COM features on Windows has resulted in stability problems which are extremely difficult to diagnose and fix. One of these remains a problem today, despite months of investigation, and forces some NVDA screen reader users to forcibly kill Firefox (or worse, forcibly power off their computers) every few hours. The proposed new architecture will not suffer from these inherent stability risks and known problems. See [https://bugzilla.mozilla.org/show_bug.cgi?id=1737193 bug 1737193] for a list of stability/reliability bugs which we expect will be fixed (or at least improved) by Cache the World. | |||
# Complexity and cost. Our existing architecture is necessarily very different on each operating system, making it extremely complex and difficult to maintain. For example, the IPC layer on Windows (~8000 lines of code) has an entirely different architecture to other platforms. This also means that maintenance is very costly, especially when implementing support for new operating systems (e.g. Android, Mac) or major Gecko architectural initiatives (e.g. Fission). Furthermore, our use of esoteric operating system specific features, especially on Windows (where we depend on a whole separate ~11000 line module), makes it very difficult for this work to be distributed across the team because of the highly specific expertise required. In the proposed new architecture, most of the "heavy lifting" will be done in cross-platform code, decreasing complexity and maintenance cost. | |||
# Support for other builtin accessibility tools. It is impossible to support Windows Narrator and Windows Speech Recognition with our current architecture. The proposed new architecture will allow for this. As these builtin tools rise in popularity, we do not want Firefox to be left behind. | |||
== Meeting Notes == | == Meeting Notes == | ||
| Line 74: | Line 82: | ||
=== Milestone 2: June 2022 === | === Milestone 2: June 2022 === | ||
TBD. Opt-in user preview. Enable in Nightly? | TBD. Opt-in user preview. Enable in Nightly? | ||
==== Bugzilla ==== | |||
<bugzilla> | |||
{ | |||
"quicksearch": "ALL whiteboard:[ctw-m2]", | |||
"include_fields": "id, summary, assigned_to, status" | |||
} | |||
</bugzilla> | |||
=== Milestone 3: September 2022 === | === Milestone 3: September 2022 === | ||
TBD. Beta experiment? Release experiment? | TBD. Beta experiment? Release experiment? | ||
==== Bugzilla ==== | |||
<bugzilla> | |||
{ | |||
"quicksearch": "ALL whiteboard:[ctw-m2]", | |||
"include_fields": "id, summary, assigned_to, status" | |||
} | |||
</bugzilla> | |||
== Backlog == | == Backlog == | ||
Revision as of 03:17, 25 February 2022
What?
Firefox's current architecture for multi-process accessibility suffers from severe performance issues and is costly and difficult to maintain due to the massively different and specialised approaches necessary on different operating systems. In addition, it is currently impossible to support builtin Windows accessibility tools such as Narrator and Windows Speech Recognition. This project aims to re-architect our multi-process accessibility support to cache the entire accessibility trees for all content processes within the parent process.
Why?
This will allow us to address several problems that are difficult or impossible to fix with the current architecture:
- Performance, especially on Windows. While performance is acceptable for daily usage in most cases, there are many use cases which are far from delightful and some which are still completely unusable. Performance with the JAWS screen reader, which is used heavily in enterprise, is sluggish at best. A great deal of work has been done over the past few years to improve this, but we're approaching a point where we will not be able to improve this any further with the current architecture. Because software other than assistive technology uses accessibility APIs (e.g. Windows touch, East Asian input methods, enterprise SSO tools), this can impact even users without disabilities. The proposed new architecture will allow us to significantly improve performance across all operating systems. See bug 1737192 for a list of performance bugs which we expect will be fixed (or at least improved) by Cache the World.
- Stability. Because the current architecture makes heavy use of synchronous IPC, there is a high risk of deadlocks between accessibility and other Firefox components. While all known cases have been addressed as they have been discovered, the underlying cause remains and future instances of this problem are very likely. In addition, our use of obscure COM features on Windows has resulted in stability problems which are extremely difficult to diagnose and fix. One of these remains a problem today, despite months of investigation, and forces some NVDA screen reader users to forcibly kill Firefox (or worse, forcibly power off their computers) every few hours. The proposed new architecture will not suffer from these inherent stability risks and known problems. See bug 1737193 for a list of stability/reliability bugs which we expect will be fixed (or at least improved) by Cache the World.
- Complexity and cost. Our existing architecture is necessarily very different on each operating system, making it extremely complex and difficult to maintain. For example, the IPC layer on Windows (~8000 lines of code) has an entirely different architecture to other platforms. This also means that maintenance is very costly, especially when implementing support for new operating systems (e.g. Android, Mac) or major Gecko architectural initiatives (e.g. Fission). Furthermore, our use of esoteric operating system specific features, especially on Windows (where we depend on a whole separate ~11000 line module), makes it very difficult for this work to be distributed across the team because of the highly specific expertise required. In the proposed new architecture, most of the "heavy lifting" will be done in cross-platform code, decreasing complexity and maintenance cost.
- Support for other builtin accessibility tools. It is impossible to support Windows Narrator and Windows Speech Recognition with our current architecture. The proposed new architecture will allow for this. As these builtin tools rise in popularity, we do not want Firefox to be left behind.
Meeting Notes
Jamie, Morgan, and Eitan meet weekly to discuss this project. You can find the meeting notes in this google doc.
Roadmap
Given the large scope of the project, we are breaking the project down into quarterly milestones. Each milestone will aim to support a set of user scenarios. This roadmap is in the early stages and subject to significant change. It will be updated as milestones become clearer, with future milestones being less well defined than earlier ones.
Milestone 0: December 2021
Initial proof of concept.
All testing will be performed with the NVDA screen reader, for two reasons:
- Cache the World is all or nothing on Windows. That makes it easy to determine where we're at regarding real world usage.
- The performance benefits are most necessary and noticeable on Windows.
In milestone 0, the following capabilities will be provided:
- Reading and navigating a page with text, links and headings.
- Plain text editing: reading the line of text when focused; backspacing; moving the caret by line, word and character.
- Access to formatting information: font, bold/italic, etc.
- Access to screen coordinates on simple pages.
- Loading very large pages will be at least 10x faster with the cache than without.
Test Scenarios
- Do a Google search, navigate the results using heading navigation and follow a result link.
- Fill out the form for a Google Advanced Search.
- Go to https://www.reaper.fm. Check the formatting of the “This is REAPER.” text (which isn’t a heading even though it should be) and confirm that the font size is reported as bigger than the paragraph of text below it.
- Build Gecko in the background. Do a Google search. Verify that the browser does not become unresponsive.
- Load https://searchfox.org/mozilla-central/source/layout/base/nsCSSFrameConstructor.cpp. Page should take < 10 sec to be usable.
- Open Gmail, find a message in the inbox, open it, read it, return to the inbox.
- Compose a message in Gmail containing text, a link, a bulleted list and a block quote. Read back through the message.
- Open Slack, use the quick switcher to switch to a channel, read some messages, write a message.
Bugzilla
Note that the roadmap wasn't created until late in milestone 0, so many bugs are missing below.
| ID | Summary | Assigned to | Status |
|---|---|---|---|
| 1735955 | Cached bounds all 0s for many (most?) elements on Google search | James Teh [:Jamie] | RESOLVED |
| 1739050 | If the focused Accessible is moved, the RemoteAccessible is recreated but focus isn't fired on it (AKA broken Google Search box on Windows + CTW) | James Teh [:Jamie] | RESOLVED |
| 1741792 | Cache the caret | James Teh [:Jamie] | RESOLVED |
| 1742902 | Fix window emulation when the cache is enabled | James Teh [:Jamie] | RESOLVED |
| 1742915 | Cache tag object attribute | James Teh [:Jamie] | RESOLVED |
| 1742917 | Implement StartOffset for RemoteAccessible and LinkIndexAtOffset for HyperTextAccessibleBase | James Teh [:Jamie] | RESOLVED |
| 1746827 | Crash in [@ PLDHashTable::Search | mozilla::a11y::RemoteAccessibleBase<T>::MinValue] | James Teh [:Jamie] | RESOLVED |
7 Total; 0 Open (0%); 7 Resolved (100%); 0 Verified (0%);
Milestone 1: March 2022
Android.
The primary focus of this milestone is getting the cache working for Android. Mozilla aims to implement Fission for Android in 2022h1. Modifying the existing multi-process architecture to support Fission on Android will require significant engineering effort. Rather than investing in a solution which we will be throwing away once the cache is implemented, we will instead switch Android to use the cache and extend the cache to include functionality required by Android.
In milestone 1, the following capabilities will be provided:
- The cache will support GroupPosition.
- TextLeafRange will support word end and line end boundaries, which are needed for Android text navigation.
- Pivot will support navigating text using TextLeafRange, which is needed for Android text navigation.
- Cached screen bounds will be updated appropriately when scrolling.
- Android will use the cache for all functionality except hit testing.
- As an interim solution, Android will use the existing async IPDL mechanism for hit testing, updated to target the call at the correct document to handle OOP iframes. (Synchronous hit testing in the core cache will take longer to implement and will be done in a future milestone.)
Test Scenarios
- TBD: simple website reading. News site?
- TBD: filling a form.
- TBD: character/word/line navigation.
- With Talkback, load https://www.nvaccess.org/. Navigate to the embedded video in four different ways: item navigation, explore by touch, controls navigation and Talkback search. Activate the Play button to play the video.
Bugzilla
40 Total; 0 Open (0%); 40 Resolved (100%); 0 Verified (0%);
Milestone 2: June 2022
TBD. Opt-in user preview. Enable in Nightly?
Bugzilla
45 Total; 0 Open (0%); 43 Resolved (95.56%); 2 Verified (4.44%);
Milestone 3: September 2022
TBD. Beta experiment? Release experiment?
Bugzilla
45 Total; 0 Open (0%); 43 Resolved (95.56%); 2 Verified (4.44%);
Backlog
19 Total; 19 Open (100%); 0 Resolved (0%); 0 Verified (0%);