Accessibility/Experiment1 feedback
Experiment 1: Video Accessibility
The Specification
A first specification for how to extend the HTML5 video to support out-of-band subtitles (and other time-aligned text) using the itext approach was developed in July 2009. It is based on a previous proposal and on several other proposals at WHATWG.
The Implementation
An implementation of this specification was developed.
It includes a specification of the <itext> elements that reference out-of-band time-aligned text files. It also includes a javascript implementation of the proposed javascript API for the <itext> elements.
It also includes use of a skin for the video player because of the need to extend the video controls - this can be ignored for the purposes of discussion of the specification.
The video used is "Elephants Dream", for which a large number of subtitles in different language are available in srt format and in different character sets.
The demo is here.
The demo works in Safari (with XiphQT installed), in Opera (experimental build), and in Firefox. Not sure about Chrome.
To make use of the textual audio annotations, you will typically need to install a screen reader such as JAWS, NVDA, or firevox. JAWS and NVDA seems to expose some bugs with the javascript-updated aria-live attributes, but otherwise read the textual audio annotations nicely.
The software is available from here.
(Please note that there are bugs in the demo, but that the idea is to discuss the concepts.)
Features of the Implementation
The demo:
- contains four different types of time-aligned text: subtitles, captions, chapters, and textual audio annotations
- extends the video controls with a menu button for the time-aligned text tracks which enables the user to switch between different languages for the different tracks
- the textual audio annotations are mapped into an aria-live activated div element, such that they are indeed read out by screen-readers; this div sits behind the video, invisible to everyone else
- the chapters are displayed as text on top of the video
- the subtitles and captions are displayed as overlays at the bottom of the video
- these three display mechanisms are supposed to be default display mechanisms for these kinds of tracks, that could be overwritten by the stylesheet of a Web developer, who intends to place the text elsewhere on screen
Bugs / missing features / limitations of the demo:
- the "delay" functionality of the specification has not been implemented yet
- only srt files have been used to implement time-aligned text functionality
- subtitles and captions currently overlap each other in the display space
- several time-aligned text categories (KTV, TIK, NB, META, TRX and LRC) have not been implemented / demonstrated yet
- currently selecting a different track through the menu doesn't work very well
- currently, switching off tracks that have been activated is not possible yet
Thoughts / Feedback
SP = Silvia Pfeiffer
PJ = Philip Jagenstedt
GF = Geoff Freed/WGBH
- SP: the distinction between captions and subtitles may not be necessary
- GF: The distinction between captions and subtitles is definitely necessary, especially if you're planning to follow the North American nomenclature (which it appears you are going to do). Subtitles are for hearing people; they're on-screen text that reflect a translation of the original audio into another language. Captions are for people who are deaf or hard-of-hearing; they are on-screen text that reflect the same language as the original audio. Captions also contain additional information (speaker cues, music indicators, placement of text) not normally found in subtitles.
- SP: yes, that was the reason there is a distinction. While there is definitely a difference between the audience of captions and subtitles and their needs, I wonder if they need a technical distinction: they are both displayed on-screen and typically in the same location. Translations can exist for both, subtitles and captions. The only reason to keep them is that there may be both, a subtitle and a caption file available in the same language. However, they should be alternatives and not additionals. So, it might make sense to somehow group them together.
- GF: Still, captions are not subtitles and subtitles are not captions. Even if you've got Spanish subtitles and Spanish captions, they're different because the captions will contain information that the subtitles won't. Really, the only thing they have in common is that they are text. The technical distinction can be made by identifying captions with one type of metadata and subtitles with another. Place them in different GUI menus, as well.
- SP: the HTML specification could be improved by including an extra hierarchical element, such as itextlist. This allows all time-aligned text categories to be handled in the same way with itext, but provides a selection mechanism for the alternative tracks. category is a required attribute.
<video ...> <itextlist category="CC" activelang="de"> <itext src="caption.de.srt" lang="de" type="text/srt" /> <itext src="caption.en.srt" lang="en" type="text/srt" /> <itext src="caption.it.srt" lang="it" type="text/srt" /> </itextlist> <itextlist category="TAD" activelang="en"> <itext src="audioann.de.srt" lang="de" type="text/srt" charset="ISO-8859" /> <itext src="audioann.en.srt" lang="en" type="text/srt" charset="ISO-8859" /> </itextlist> </video>
- PJ: can we make this fit in with or replace the addCueRange/removeCueRanges interface? Basically, I believe it should be possible to, using a DOM interface, add the same timed text ranges that would result from letting the browser parse SRT. The only difference between itext and the cue ranges interface is that one is associated with text while the other uses callbacks. The allText property would need to be replaced with another representation where both the times and the text (or callbacks) can be created/modified/deleted. Something like an array of
interface MediaTimeRange { attribute double start; attribute double end; attribute DOMString text; attribute Function onenter; attribute Function onleave; }
- SP reply: I suppose, similar to Ian's proposal for extending srt to support karaoke and lyrics, it could also be extended with functions for onenter and onleave.
- PJ: Then, the delay method would become somewhat redundant, better to handle this by rewriting the times via DOM (also allows fixing drift, not just constant delay) currentText also wouldn't be needed.
- PJ: I think that the fetch and error mechanism might be overkill, how about letting the UA decide if/when to download the resources? You might still want a fetched property I guess, but we might reuse the complete property and onload event from the img element.
- PJ: Making enabled writable would remove the need for enable()/disable().
- PJ: Depending on the user agents preferred language setting has failed miserable so far - most users just leave it as the default. Sites are forced to use explicit language selection or guess the language based on IP, I expect it would be no different for this feature. I honestly don't know what a good solution for this is.
- SP reply: By providing a selection mechanism through the "display" attribute, it is possible for the Web developer to override the preferred language setting. Further, the user can do explicit selection through the menu.
- PJ: For scripts, the charset attribute is ignored for cross-origin because interpreting something under a different charset than intended can give different results. The cross-origin problem is probably more relevant when it comes to allText/currentText/MediaTimeRange. I'm not sure if verifying that the resource is in fact a supported type is enough, as that would still allow web sites to read subtitle files from the intranet of the client if they can guess the URL. Imagine if the full text of http://internal/secret-talk-transcript.srt was available for all to see through this API.
- SP reply: srt does not have a specification for charset, so the server can only guess the correct charset to provide with the srt file. Thus, IMHO the only means in which a Web developer can provide the correct charset to use for a srt file is by providing it in such an attribute. If that could be avoided, I'd be all for it.
- more feedback encouraged!