Accessibility/Experiment2 feedback

From MozillaWiki
Jump to: navigation, search

Experiment 2 feedback

The Specification

A second specification for how to extend the HTML5 video to support out-of-band subtitles (and other time-aligned text) using the itext approach was developed in October 2009. It is based on the feedback received from v1, on a previous proposal and on several other proposals at WHATWG.

Thoughts / Feedback

SP = Silvia Pfeiffer/Xiph, Mozilla

JJ = Jack Jansen/CWI

SD = Sam Dutton/BBC

GF= Geoff Freed, WGBH/NCAM

  • SP: associating the category with the itextlist rather than itext has the disadvantage of not being able to group captions, subtitles, lyrics and similar "line 23" content within one display area and make them alternatives. OTOH, it is only a small overhead to disable them all displaying at the same time.

  • JJ: I get the impression that all the functionality you need is already available in various specs, and could be re-used easily without inventing new syntax.

    The category and name functionality could be picked up from XHTML role, if I'm not mistaken. The itext/itextlist is SMIL par and text (or ref). Both of these specs are modularized, so you should be able to pick up just the pieces you need. In the namespaced XML world then you would be done, in the HTML world you would need a bit of extra work to import things into your spec.
SP reply: JJ, you are right: there are plenty of existing syntax elements in other specifications that could be tweaked, adapted and possibly re-used. However, none of them really fit.
“category” is very different to “role” – it is the category of time-aligned text we are talking about and there is a limited list part of the spec.
“name” could be replaced by “title” or something else – I am not particularly fussed about this though I needed it as an attribute rather than as content model, which would have been more obvious.
I am also consciously refraining from re-implementing SMIL. I do not want the full complexity of the “seq” and “par” elements. Also, the “text” or “ref” elements do not compare to “itext” which references a particular type of interactive text files similar to how “img” references particular types of image files.
Further, HTML doesn’t do namespaces, so every adoption from another standard would need to be replicated into HTML anyway. And since there is not an exact match between the needs that itext and itextlist express and those provided by other specs, I’d rather avoid that complexity.
The important thing here is though that we have looked at existing syntaxes and have learnt from them, so even through there is no direct re-use, there is indeed conceptual re-use and learnings.
  • SD: I'm also interested in how timed-event 'fragments' might be handled — SMIL seems oriented to complete presentations — and the possibility of live timed events: for example, subtitling of live broadcasts, or content pushed or pulled in addition to live streaming, such as commentary and additional content broadcast during a concert.
We've also been looking at how to implement custom timed 'events' in a flexible way (though states might be a better word). For example, a carousel widget could listen for chapter events emitted by a video:

"start": 20.00,
"end": 30.00,
"sender": "video#myVideo",
"type": "chapter",
"title": "Single-celled organisms",
"description": "Single-celled microorganisms began to develop 3-4 billion years ago.",
"src": "single_cell.jpg",
"href": ""
(Data here is shown as an JavaScript object literal, but could be in other formats. The main thing is that the event object properties can have any name and any value type.)
Alternatively, an element could emit timed CSS, HTML (or even JavaScript) events:
"start": 5.00,
"end": 10.00,
"sender": "video#myVideo"
"event": "timeupdate"
"receiver": "div#subtitle"
"type": "HTML",
"value": "It is half past nine and we've only just passed Sheffield”
Subtitles could be made bold for a few seconds like this:
“start”: 62.13,
“end”: 65.29,
“sender”: “video#myVideo”
“event”: “timeupdate”
“receiver”: “div#subtitle”
“type”: “CSS”,
“value”: {”font-weight”: “bold”}
I was also trying to say that it would be good to be able to listen for enter and leave events as well as being able to to attach event handler callbacks to onenter and onleave — if only to encourage coders to move JavaScript out of HTML.
Reply SP: Re SMIL: yes, it is oriented towards multi-media presentations where the timeline is in control – that is not how Web pages work, which are essentially static text content enriched with interactive and media elements. Thus the poor fit.
Reply SP: Re live timed events: I think it is possible to point the video src url to a live broadcast, which then gets updated continuously. It might make sense to turn off the controls for such an element. I also don’t see a problem in attaching a subtitle file that is continuously updated in an itext element to such a live video source. The text could continue to be pushed/pulled. With javascript, it would also be possible to continue pulling other content, such as images or other text.
Reply SP: Re timed events: Yes, addition of enter and leave events make sense to complement the onenter and onleave callbacks.

  • GF: It's important to have a style option that reveals TAD, which would be useful for visually impaired viewers who prefer to both see and hear text.
Reply SP: There are obviously multiple display options - the idea is to provide a useful default. Adaptations should indeed be possible.
Reply GF: One other point to consider re hiding TAD: some screen readers will not announce updates to regions marked with visibility:hidden (or display:none), so a TAD default of visibility:hidden and aria-live="assertive" will be ignored by these screen readers. A way around this is to make the content visible off-screen, with markup such as the following:

{ position: absolute;
overflow: hidden;
top: -999;
width: 1px;
height: 1px;

The content remains off-screen so it can't be seen, but a screen reader can see it and announce the updates.

  • GF: As I've stated before, I question the use of SRT as the default.
Reply SP: Nothing excludes other formats, in particular DFXP. The simplest option should always be the possible, though, and the minimum requirement.

  • GF: Is it possible to override the default settings of the browser? How would this be accomplished?
Reply SP: I think you were referring to overriding the default styling settings of the browser. I was hoping that the display elements could be made accessible to the Web page, i.e. introduced into the DOM, and thus be stylable by the Web developer. However, the idea is that browser preferences will be able to overrule what a developer has set, since the developer does not know what requirements a user has, e.g. HoH or VI.
  • GF: What about displaying text in multiple regions simultaneously? For example, French subtitles in the upper third and English captions in the lower third? Or, as is not uncommon in broadcast captions, the display of two sets of captions simultaneously (e.g., when two people are speaking at the same time).
Reply SP: These pose more complex requirements on the captioning format that what srt is able to do and would need to be provided through a different format - I believe DFXP caters for it. If the existing DFXP -> HTML5 test implementation is capable of doing such, I would suggest to use such a format for such a requirement.
Reply GF: I would consider this an important requirement for a default format, especially to support the broadcast example I describe above.
  • GF: Also, what about providing both CC and TAD at the same time? The use of one should not exclude the use of other.
Reply SP: That is already possible and demonstrated by my demos. If you have a screenreader installed, try out . Different categories of time-aligned text are supposed to be additional to each other, not alternative.

  • GF: It would be useful to be able to activate more than one track at a time: for example, to be able to display both English and German text simultaneously. How would this affect the area occupied by the player?
Reply SP: After some experimentation with simultaneous display of alternate subtitles it seemed to me that display of more than 2 languages at the same time is impractical. Even two subtitle tracks at the same time is almost useless since the default display is on top of the video and would occupy an extraneously large space. For other types of time-aligned text, such as captions or TAD, the display of more than one language doesn't make sense generally. So, for now, the sepcification does not to cater for such a requirement in the default display. If somebody wants to display two tracks, they would need to write some javascript to enable that.
Reply GF: I wouldn't necessarily write this off so quickly as an obscure edge case. It isn't unheard of to combine foreign-language subtitles with captions, especially when you consider that captions can be used to insert non-speech cues, such as sound effects or speaker changes when it isn't obvious on screen, etc.
Reply SP: I haven't written it off, just didn't think it was a mainstream need. There is always the possibility for this to be activated with some extra javascript code.