: New feature! MozillaWiki is now mobile-friendly. Visit from a mobile device to see new mobile theme + try editing. Release details.

Accessibility/HTML5 captions v2

From MozillaWiki
Jump to: navigation, search

Specification of the itext element (Second Version)


The feedback on the first version has encouraged a full work-over of the specification.

Please leave feedback on this second version at https://wiki.mozilla.org/Accessibility/Experiment2_feedback.

A particular change is the introduction of a grouping-level element between the media element and the itext element. This has several advantages: the itext elements inside this element are regarded as alternative tracks for display in one region, only at most one of them can be active at any point in time, a default display style can be associated with them, and event handlers for entering and leaving a new element can be associated. The only disadvantage is that there are now two new elements rather than only one to extend an already large HTML5 specification.

Other feedback on attributes, events and on how to deal with in-line tracks has also been taken into account, such that this specification is taking a big step towards a more universal approach.

Below are the specifications of the two new proposed elements: itextlist and itext, plus further suggestions on how browsers should deal with them.

The itextlist element

The itextlist element


   Metadata content.
   Flow content.
   Phrasing content.

Contexts in which this element may be used:

   In a video or audio element that is a child of a body element.

Content model:

   One or more itext elements that are alternatives for the display area
   of the itextlist element.

Content attributes:

   Global attributes (include id, class, title and style)

DOM interface:

   interface HTMLItextListElement : HTMLElement {
              attribute DOMString category;
              attribute DOMString active;
              attribute DOMString name;
     // event handler 
              attribute Function onenter;
              attribute Function onleave;

The itextlist element allows authors to provide a list of alternative information text tracks that relate to a video or audio element. These tracks provide e.g. captions or subtitles in alternative languages.

1. Attributes

The itextlist groups itext elements of the same category together. The "category" attribute describes what function the informative text represents and can be one of the following:

  • CC: closed captions (for the deaf)
  • SUB: subtitles (for i18n)
  • TAD: textual audio descriptions (for the blind; to be used as braille or through TTS)
  • KTV: karaoke
  • TIK: ticker text
  • AR: active regions
  • NB: semantic annotations, including speech bubbles and director comments
  • META: metadata, mostly machine-readable
  • TRX: transcripts / scripts
  • LRC: lyrics
  • LIN: linguistic markup
  • CUE: cue points, DVD style chapter markers and similar navigational landmarks

The "active" attribute describes which itext element is active and can have the values "none", "auto", or the id name of a contained itext element. In case of an error, no itext element will be active. The default value is "auto", which means an itext track is selected based on the browser settings. These include the browser's language setting and its setting related to accessibility - e.g. for blind users no text is displayed, but if a TAD track is available, it will be activated in the selected language.

The "name" attribute is optional. It specifies the name that should be used in a menu that is created when there are several itextlist elements and each itextlist represents a submenu.

2. CSS default styling based on the category attribute

Text itself:

 color: white;
 opacity: 100%;
 text-align: center;


 displayed on top of the video or above the audio element
 centered on the width of the video or audio element  
 aligned at the bottom of the video or audio element
 above the controls if visible
 background-color: #333333;

TAD area:

 visibility: hidden;
 aria-live: assertive;

CUE area:

 displayed above the video or audio element
 centered on the width of the video or audio element  
 aligned with the top of the video or audio element
 background-color: #333333;

others TBD

3. Extension of video controls

The existence of itextlist and itext element should cause the browser to extend the video or audio controls with a menu from which to select / activate / deactivate the available subtitle tracks. This menu must also be created upon parsing of a binary audio/video file that includes such tracks.

4. Event handlers

An active itext element will consist of a series of text elements with a start and a end time. Only one such text element is at most displayed per itextlist - the last element for which the audio or video element's currentTime is between its start and end time.

As a new such text element is displayed, the itextlist element's registered onenter callback function is called - if one such has been registered.

As the end of such a text element is reached, the itextlist element's registered onleave callback function is called - if one such has been registered.

These functions allow cues to be associated with text elements, e.g. the display of a special offer, or moving to another sentence in a full text transcript of a video.

The itext element

The itext element


   Metadata content.
   Flow content.
   Phrasing content.

Contexts in which this element may be used:

   In a itextlist element that is a child of a audio or video element.

Content model:


Content attributes:

   Global attributes (include id, class and style)

DOM interface:

   interface HTMLItextElement : HTMLElement {
              attribute DOMString src;
              attribute DOMString lang;
              attribute DOMString type;
              attribute DOMString charset;
              attribute float     delay;
              attribute unsigned long stretch;
     readonly attribute boolean fetched;
     readonly attribute ItextError error;
     readonly attribute HTMLCollection allText;
     readonly attribute langName;
     DOMString currentText(currentTime);

The itext element allows authors to include a link to an external file that contains informative text about the video. The external resource is expected to consist of a sequence of time intervals with associated text and potentially layout, styling, and animation information for the text. The text is displayed as the parent audio or video element goes through its time interval, i.e. the parent's currentTime has reached the start time of the interval but has not yet reached the end time of the interval (a semi-open interval: [start,end) ).

1. Interpreting the itext resource

The src attribute gives the address of the external itext resource. The value of the attribute must be a valid URL identifying a text resource of the type given by the type attribute, if the attribute is present, or of the type "text/srt", if the attribute is absent. This attribute is required to enable the user agent to pick the correct parser for the file, even if it only receives a "text/plain" resource.

The type attribute gives the format of the data, RFC 2046. If the attribute is present, its value must be a valid MIME type, optionally with parameters. The type parameter must not be specified. (The default, which is used if the attribute is absent, is "text/srt".) [RFC2046]

NOTE: text/srt will need to be registered as a mime type (as well as a format standardisation)

The lang attribute, if present, gives the language of the linked resource. The value must be a valid RFC 3066 language code. [RFC3066] User agents will use this attribute to select between, e.g., all itext elements given for a video or audio element that belong to the same category, but represent different languages. User agents that discover upon fetching of the resource that language information associated with the resource differs from the given lang, will set an error code on the element.

The charset attribute gives the character encoding of the external text resource. If the attribute is set, its value must be a valid character encoding name, must be the preferred name for that encoding, and must match the encoding given in the charset parameter of the Content-Type metadata of the external file, if any. [IANACHARSET] This attribute is required since many formats and in particular text/srt does not provide the charset it is encoded in within the resource and thus the user agent has no way of knowing how to interpret and represent the characters. If the attribute is not given, a default of UTF-8 is assumed, unless the document itself indicates its charset.

2. Itext fetching

An itext resource is not automatically fetched as the element is parsed, since there may be a sizeable number of external resources to retrieve for an individual video or audio element. It is only fetched if the parent itextlist element activates it.

Fetching an itext resource means following the src URL and retrieving the resource. Fetching the external resource must not delay the audio or video. The user agent will work with the fetched itext resource as soon as it is retrieved.

The "fetched" attribute will signify if the fetching process has finished.

3. Itext display

An activated itext resource displays its content into a specified screen area provided by the browser based on default styling. If the active itext resource changes, the text will change to originate from the new resource as soon as possible to the browser without interrupting any of its other display requirements (e.g. audio/video playback).

4. Itext adjustments

The itext resource is synchronised to its parent audio or video element through the audio or video's currentTime attribute. Sometimes, synchronisation can be off.

The delay and stretch attributes enables the publisher to make fixes to the timing information inside an itext resource. The "delay" attribute will start the itext resource with a delay and the "stretch" attribute will calculate a constant drift on each timing information of the itext resource. "delay" is given in seconds with 0 being the default, and "stretch" is given as a percentage with 100% the default.

5. Itext errors

The error attribute contains the last error that may have appeared in relation to the itext resource.

interface ItextError {

 const unsigned short ITEXT_ERR_ABORTED = 1; // fetching aborted
 const unsigned short ITEXT_ERR_NETWORK = 2; // network error
 const unsigned short ITEXT_ERR_PARSE = 3;   // parsing error of itext resource
 const unsigned short ITEXT_ERR_SRC_NOT_SUPPORTED = 4; // unsuitable itext resource
 const unsinged short ITEXT_ERR_LANG = 5;    // language mismatch
 readonly attribute unsigned short code;


6. Other itext attributes

The allText attribute allows access to all the text segments as extracted from the active itext resource.

The langName attribute exposes the full language name for the itext resource, such that a JavaScript developer can display it in a menu. The lang attribute of the itext elements themselves contains the actual language code which is more appropriate for automated processing.

7. Itext text extraction

The currentText(currentTime) function returns the current text segment from the itext resource, i.e. the text that is active at the parent's currentTime attribute value.


1. Simple subtitles example

 <video src="video.ogv" controls>
   <itextlist category="SUB">
     <itext src="sub_en.srt" lang="en"/>
     <itext src="sub_de.srt" lang="de"/>
     <itext src="sub_fr.srt" lang="fr"/>
     <itext src="sub_jp.srt" lang="jp"/>

The default type is "text/srt" and the default charset it "UTF-8".

The default active track is selected from the language setting of the browser, if a match can be found in the browser settings prioritised languages.

2. Caption example with diverse formats

 <video src="video.ogv" controls>
   <itextlist category="CC" active="none">
     <itext src="caption_en.xml" lang="en" type="application/ttaf+xml"/>
     <itext src="caption_de.srt"  lang="de" charset="ISO-8859-1" delay="3" stretch="97%"/>
     <itext src="caption_fr.smil" lang="fr" type="application/smil+xml" />
     <itext src="caption_jp.ssa"  lang="jp" type="application/x-ssa"/>

There are srt, dfxp, smil and ssa files in this specification. It is still questionable if a browser would want to support all these formats, but this specification provides for this possiblity.

The srt file has a different charset. It also has to be delayed by 3s relative to the start of the video file and a stretch of 97% to make up for a constant drift in timing difference between the video and the caption file.

Also note that none of the captions is active by default, but rather have to be turned on by a user interaction. This could possibly be overridden by the browser for a deaf user.

3. Textual audio description example

 <video src="video.ogv" aria-label="test video" title="test video" controls>
   <itextlist category="TAD" active="tad_en">
     <itext id="tad_en" src="tad_en.srt" lang="en"/>
     <itext id="tad_de" src="tad_de.srt" lang="de"/>
     <itext id="tad_fr" src="tad_fr.srt" lang="fr"/>
     <itext id="tad_jp" src="tad_jp.srt" lang="jp"/>

The active textual audio description is the English track. It will be decoded and the text segments made available, but they won't be visually displayed since the default display style is hidden visibility. However, if a screen reader is available, the assertive aria-live attribute will force text changes to be read out.

Also note how the video element now has an aria-label attribute, which will also be read out by a screen reader upon tabbing onto a video element. The menu that is being created from the itextlist and itext elements will also need to be made accessible.

4. Chapter markers with cue ranges example

 <video src="video.ogv" controls>
   <itextlist category="CUE" onenter="showChapterImg()" onleave="removeChapterImg()">
     <itext src="chapters_en.srt" lang="en"/>
     <itext src="chapters_de.srt" lang="de"/>
     <itext src="chapters_fr.srt" lang="fr"/>
     <itext src="chapters_jp.srt" lang="jp"/>

A JavaScript function is called upon entering or leaving a new chapter.

5. Multiple itextlist elements

 <video src="video.ogv" aria-label="test video" title="test video" controls>
   <itextlist category="SUB" name="subtitles">
     <itext src="sub_en.srt" lang="en"/>
     <itext src="sub_de.srt" lang="de"/>
     <itext src="sub_fr.srt" lang="fr"/>
     <itext src="sub_jp.srt" lang="jp"/>
   <itextlist category="TAD" active="tad_en" name="spoken transcript">
     <itext id="tad_en" src="tad_en.srt" lang="en"/>
     <itext id="tad_de" src="tad_de.srt" lang="de"/>
     <itext id="tad_fr" src="tad_fr.srt" lang="fr"/>
     <itext id="tad_jp" src="tad_jp.srt" lang="jp"/>

As the controls attribute in the video element is specified, this creates a menu with subtitles and spoken transcript as the submenu items to select from. Only one track can be activated at any point in time from within a itextlist. Also, an additional element of "none" should be added by the browser to allow to deactivate all itext elements.