Accessibility/Experiment1 feedback: Difference between revisions

Revision as of 16:02, 27 July 2009

Experiment 1: Video Accessibility

The Specification

A first specification for how to extend the HTML5 video to support out-of-band subtitles (and other time-aligned text) using the itext approach was developed in July 2009. It is based on a previous proposal and on several other proposals at WHATWG.

The Implementation

An implementation of this specification was developed.

It includes a specification of the <itext> elements that reference out-of-band time-aligned text files. It also includes a javascript implementation of the proposed javascript API for the <itext> elements.

It also includes use of a skin for the video player because of the need to extend the video controls - this can be ignored for the purposes of discussion of the specification.

The video used is "Elephants Dream", for which a large number of subtitles in different language are available in srt format and in different character sets.

The demo is here.

The demo works in Safari (with XiphQT installed), in Opera (experimental build), and in Firefox. Not sure about Chrome.

To make use of the textual audio annotations, you will typically need to install a screen reader such as JAWS, NVDA, or firevox. JAWS and NVDA seems to expose some bugs with the javascript-updated aria-live attributes, but otherwise read the textual audio annotations nicely.

The software is available from here.

(Please note that there are bugs in the demo, but that the idea is to discuss the concepts.)

Features of the Implementation

The demo:

contains four different types of time-aligned text: subtitles, captions, chapters, and textual audio annotations

extends the video controls with a menu button for the time-aligned text tracks which enables the user to switch between different languages for the different tracks

the textual audio annotations are mapped into an aria-live activated div element, such that they are indeed read out by screen-readers; this div sits behind the video, invisible to everyone else

the chapters are displayed as text on top of the video

the subtitles and captions are displayed as overlays at the bottom of the video

these three display mechanisms are supposed to be default display mechanisms for these kinds of tracks, that could be overwritten by the stylesheet of a Web developer, who intends to place the text elsewhere on screen

Bugs / missing features / limitations of the demo:

the "delay" functionality of the specification has not been implemented yet

only srt files have been used to implement time-aligned text functionality

subtitles and captions currently overlap each other in the display space

several time-aligned text categories (KTV, TIK, NB, META, TRX and LRC) have not been implemented / demonstrated yet

currently selecting a different track through the menu doesn't work very well

currently, switching off tracks that have been activated is not possible yet

Thoughts / Feedback

SP = Silvia Pfeiffer

PJ = Philip Jagenstedt

GF = Geoff Freed/WGBH

SP: the distinction between captions and subtitles may not be necessary

GF: The distinction between captions and subtitles is definitely necessary, especially if you're planning to follow the North American nomenclature (which it appears you are going to do). Subtitles are for hearing people; they're on-screen text that reflect a translation of the original audio into another language. Captions are for people who are deaf or hard-of-hearing; they are on-screen text that reflect the same language as the original audio. Captions also contain additional information (speaker cues, music indicators, placement of text) not normally found in subtitles.

SP: the HTML specification could be improved by including an extra hierarchical element, such as itextlist. This allows all time-aligned text categories to be handled in the same way with itext, but provides a selection mechanism for the alternative tracks. category is a required attribute.

 <video ...>

 <itextlist category="CC" activelang="de">
  <itext src="caption.de.srt" lang="de" type="text/srt" />
  <itext src="caption.en.srt" lang="en" type="text/srt" />
  <itext src="caption.it.srt" lang="it" type="text/srt" />
 </itextlist>

 <itextlist category="TAD" activelang="en">
  <itext src="audioann.de.srt" lang="de" type="text/srt" charset="ISO-8859" />
  <itext src="audioann.en.srt" lang="en" type="text/srt" charset="ISO-8859" />
 </itextlist>

 </video>

PJ: can we make this fit in with or replace the addCueRange/removeCueRanges interface? Basically, I believe it should be possible to, using a DOM interface, add the same timed text ranges that would result from letting the browser parse SRT. The only difference between itext and the cue ranges interface is that one is associated with text while the other uses callbacks. The allText property would need to be replaced with another representation where both the times and the text (or callbacks) can be created/modified/deleted. Something like an array of

interface MediaTimeRange {
 attribute double start;
 attribute double end;
 attribute DOMString text;
 attribute Function onenter;
 attribute Function onleave;
}

SP reply: I suppose, similar to Ian's proposal for extending srt to support karaoke and lyrics, it could also be extended with functions for onenter and onleave.

PJ: Then, the delay method would become somewhat redundant, better to handle this by rewriting the times via DOM (also allows fixing drift, not just constant delay) currentText also wouldn't be needed.

PJ: I think that the fetch and error mechanism might be overkill, how about letting the UA decide if/when to download the resources? You might still want a fetched property I guess, but we might reuse the complete property and onload event from the img element.

PJ: Making enabled writable would remove the need for enable()/disable().

PJ: Depending on the user agents preferred language setting has failed miserable so far - most users just leave it as the default. Sites are forced to use explicit language selection or guess the language based on IP, I expect it would be no different for this feature. I honestly don't know what a good solution for this is.

SP reply: By providing a selection mechanism through the "display" attribute, it is possible for the Web developer to override the preferred language setting. Further, the user can do explicit selection through the menu.

PJ: For scripts, the charset attribute is ignored for cross-origin because interpreting something under a different charset than intended can give different results. The cross-origin problem is probably more relevant when it comes to allText/currentText/MediaTimeRange. I'm not sure if verifying that the resource is in fact a supported type is enough, as that would still allow web sites to read subtitle files from the intranet of the client if they can guess the URL. Imagine if the full text of http://internal/secret-talk-transcript.srt was available for all to see through this API.

SP reply: srt does not have a specification for charset, so the server can only guess the correct charset to provide with the srt file. Thus, IMHO the only means in which a Web developer can provide the correct charset to use for a srt file is by providing it in such an attribute. If that could be avoided, I'd be all for it.

more feedback encouraged!

@@ Line 62: / Line 62: @@
 PJ = Philip Jagenstedt
+GF = Geoff Freed/WGBH
 * SP: the distinction between captions and subtitles may not be necessary
+* GF:  The distinction between captions and subtitles is definitely necessary, especially if you're planning to follow the North American nomenclature (which it appears you are going to do).  Subtitles are for hearing people; they're on-screen text that reflect a translation of the original audio into another language.  Captions are for people who are deaf or hard-of-hearing; they are on-screen text that reflect the same language as the original audio.  Captions also contain additional information (speaker cues, music indicators, placement of text) not normally found in subtitles.
 * SP: the HTML specification could be improved by including an extra hierarchical element, such as itextlist. This allows all time-aligned text categories to be handled in the same way with itext, but provides a selection mechanism for the alternative tracks. category is a required attribute.

Accessibility/Experiment1 feedback: Difference between revisions

Revision as of 16:02, 27 July 2009

Contents

Experiment 1: Video Accessibility

The Specification

The Implementation

Features of the Implementation

Thoughts / Feedback

Navigation menu

Accessibility/Experiment1 feedback: Difference between revisions

Revision as of 16:02, 27 July 2009

Experiment 1: Video Accessibility

The Specification

The Implementation

Features of the Implementation

Thoughts / Feedback

Navigation menu

Search