Accessibility/Experiment1 feedback

From MozillaWiki
Jump to navigation Jump to search

Experiment 1: Video Accessibility

The Specification

A first specification for how to extend the HTML5 video to support out-of-band subtitles (and other time-aligned text) using the itext approach was developed in July 2009. It is based on a previous proposal and on several other proposals at WHATWG.

The Implementation

An implementation of this specification was developed.

It includes a specification of the <itext> elements that reference out-of-band time-aligned text files. It also includes a javascript implementation of the proposed javascript API for the <itext> elements.

It also includes use of a skin for the video player because of the need to extend the video controls - this can be ignored for the purposes of discussion of the specification.

The video used is "Elephants Dream", for which a large number of subtitles in different language are available in srt format and in different character sets.

The demo is here.

The demo works in Safari (with XiphQT installed), in Chrome (experimental build), in Opera (experimental build), and in Firefox.

To make use of the textual audio annotations, you will typically need to install a screen reader such as JAWS, NVDA, or firevox. JAWS and NVDA seems to expose some bugs with the javascript-updated aria-live attributes, but otherwise read the textual audio annotations nicely.

The software is available from here.

(Please note that there are bugs in the demo, but that the idea is to discuss the concepts.)


Features of the Implementation

The demo:

  • contains four different types of time-aligned text: subtitles, captions, chapters, and textual audio annotations
  • extends the video controls with a menu button for the time-aligned text tracks which enables the user to switch between different languages for the different tracks
  • the textual audio annotations are mapped into an aria-live activated div element, such that they are indeed read out by screen-readers; this div sits behind the video, invisible to everyone else
  • the chapters are displayed as text on top of the video
  • the subtitles and captions are displayed as overlays at the bottom of the video
  • these three display mechanisms are supposed to be default display mechanisms for these kinds of tracks, that could be overwritten by the stylesheet of a Web developer, who intends to place the text elsewhere on screen


Bugs / missing features / limitations of the demo:

  • the "delay" functionality of the specification has not been implemented yet
  • only srt files have been used to implement time-aligned text functionality
  • subtitles and captions currently overlap each other in the display space
  • several time-aligned text categories (KTV, TIK, NB, META, TRX and LRC) have not been implemented / demonstrated yet
  • currently selecting a different track through the menu doesn't work very well
  • currently, switching off tracks that have been activated is not possible yet


Thoughts / Feedback

SP = Silvia Pfeiffer PJ = Philip Jagenstedt

  • SP: the distinction between captions and subtitles may not be necessary
  • SP: the HTML specification could be improved by including an extra hierarchical element, such as itextlist. This increases complexity, but makes the creation of the selection menu much easier.

<video ...> <itextlist category="CC" activelang="de"> <itext src="caption.de.srt" lang="de" type="text/srt" /> <itext src="caption.en.srt" lang="en" type="text/srt" /> <itext src="caption.it.srt" lang="it" type="text/srt" /> </itextlist> <itextlist category="TAD" activelang="en"> <itext src="audioann.de.srt" lang="de" type="text/srt" charset="ISO-8859" /> <itext src="audioann.en.srt" lang="en" type="text/srt" charset="ISO-8859" /> </itextlist> </video>

  • PJ: can we make this fit in with or replace the addCueRange/removeCueRanges interface? Basically, I believe it should be possible to, using a DOM interface, add the same timed text ranges that would result from letting the browser parse SRT. The only difference between itext and the cue ranges interface is that one is associated with text while the other uses callbacks. The allText property would need to be replaced with another representation where both the times and the text (or callbacks) can be created/modified/deleted. Something like an array of

interface MediaTimeRange {

attribute double start;
attribute double end;
attribute DOMString text;
attribute Function onenter;
attribute Function onleave;

}

  • PJ: Then, the delay method would become somewhat redundant, better to handle this by rewriting the times via DOM (also allows fixing drift, not just constant delay) currentText also wouldn't be needed.
  • PJ: Making enabled writable would remove the need for enable()/disable().
  • PJ: Depending on the user agents preferred language setting has failed miserable so far - most users just leave it as the default. Sites are forced to use explicit language selection or guess the language based on IP, I expect it would be no different for this feature. I honestly don't know what a good solution for this is.
  • PJ: For scripts, the charset attribute is ignored for cross-origin because interpreting something under a different charset than intended can give different results. The cross-origin problem is probably more relevant when it comes to allText/currentText/MediaTimeRange. I'm not sure if verifying that the resource is in fact a supported type is enough, as that would still allow web sites to read subtitle files from the intranet of the client if they can guess the URL. Imagine if the full text of http://internal/secret-talk-transcript.srt was available for all to see through this API.
  • more feedback encouraged!