Accessibility/Captioning Work Plan

From MozillaWiki
Jump to: navigation, search

To do:

  1. Separate into phase 1 and phase 2. How should we split? Captioning vs. audio description? Or, make technical choices in phase 1 and create test cases/docs etc. in phase 2?
  2. I suggest that the work plan say in bold for each item, what the deliverable is, e.g. "Deliverable: test cases"
  3. Add technology specifics -- any specifics we need
  4. Add points for audio description
  5. Describe the true complexity of the problem.
  6. Question: is this really a grant to fix captioning in Ogg? If other formats already have captioning built-in, what else might need to be done? How does this relate to the greater HTML 5 effort?


There is both a container and a set of codecs involved for each video format. Typically there's a video codec and an audio codec. The captioning format can be thought of as a timed text codec.

For a given container, you need to have a defined (and supported by tools and other video players for network effects) mapping for muxing a data stream for a given codec into the container. Typically, such mapping don't exist for all codec/container combinations, and there are established combinations that work.

The format choice is driven by the video codec, so the container is then the container typically paired with the chosen video codec.

The choice of captioning format then depends on what's conventional for the container.

In theory, given a muxing rule, you can put any video codec and any captiong format in any container, but in practice, video codec tends to have a conventional native container, so the video codec dictates the container and then different containers have different conventional timed text formats and the timed text formats might not have muxing rules for non-native containers.

Gecko will embed an Ogg-specific playback framework called liboggplay. It only supports the Ogg container format.

The Flash plug-in effectively provides a specialized playback framework for .flv and .mp4 containers.

Typically, desktop environments come with a more general timed media playback framework. These frameworks can load extension libraries that enable support for various containers and codecs.

Desktop Framework
Windows DirectShow
Mac OS X QuickTime
Gnome GStreamer
KDE Phonon

Example: Ogg and MP4 are containers, whereas Theora and H.264 are codecs. Gstreamer and QuickTime are both timed media frameworks, which each can play various container/codec combinations. Ogg, Theora and CMML are a natural match. MP4, H.264 and 3GPP TT are a natural match. While technically, you 'could' define a way to put 3GPP TT inside Ogg, the disadvantage to doing this is that the result might not interoperate well with authoring tools and other player due to the combination being unusual.

Container Codecs Authoring tools Natural captioning format
  • Theora (video)
  • Vorbis (audio)
MP4 H.264 3GPP Timed Text

Note: Subrip is external to the video container and can be used with any format. The main known disadvantage of this is blah, blah. It would make sense to use this if blah.

Work plan for Captioning

  1. Determine which captioning format should be supported in Mozilla for the natively-supported Ogg video. This needs to take into account the extremely complex map of video formats and players today (see above).
  2. Determine which subset of that format is the most crucial. This can save the Mozilla developers a good deal of work, because captioning formats are complex. Some of the complexity is necessary and some is not necessary for Mozilla suppoort
  3. Work with HTML 5, web browser development and captioning communities to ensure that the solution will be accepted. We don't want different solutions in each browser. That would either mean one browser would need to redo their work, or that caption developers would have to deal with incompatible solutions in different browsers.
  4. Explore the need to support the following features and ensure support when found necessary:
    1. social caption creation (This poses very different requirements than the idea of making video files intrinsically accessible. Hsivonen 09:06, 4 August 2008 (UTC)) aaronlev Henri also mentioned that potential legal issues could affect technical issues, but we aren't sure. It would be good if WGBH had some background to help understand this as well while devising a captioning solution.
    2. metadata indicating changes in captioning language for search and Braille. (Google seems to be doing better by ignoring author-entered language metadata. Is rendering foreign words into Braille strong enough an use case to justify the complexity of supporting this and authoring with this data. Hsivonen 09:06, 4 August 2008 (UTC))
    3. semantics and style, etc. (This seems like a pretty big departure from baseline established by TV captioning. Hsivonen 09:06, 4 August 2008 (UTC)) aaronlev I'm not sure -- there are some higher level things such as embedding of a musical note graphic to indicate music. I believe that captioning is moving toward expressing more complex background information.
  5. Ensure captioning solution is compatible with current authoring and if possible, video conversion tools, so that current and future content can easily use the solution
  6. Ensure the solution is compatible with both existing media repurposed for the Internet (i.e., originating in broadcast and cable TV environments and physical media like DVDs and theatrical motion pictures) and media originally developed for Internet distribution, including user-generated content.
  7. Ensure the solition incorporates the expressed needs and preferences of Internet-based media users with sensory disabilities. (( Aaron: what are these? Can we express these up front? WGBH must already know this info, e.g. why would it change based on internet vs. brodcast? ))
  8. Ensure all solutions, documentation and tests developed are friendly to open source contributors and clean of known IP conflicts
  9. Participate in the relevant deliberations, meetings, standards development activities and proposed work products of HTML 5 WG.
  10. Determine what can be done about supporting captioning when an external back end (gstreamer, QuickTime, DirectShow) is in use. (MP4 containers are the most likely external back end case, so 3GPP Timed Text is a potential format candidate in that case.)
  11. Build a complete set of open-licensed documentation and test cases for developers and content creators. In general, reach out to developers implementing captioning solutions for the web and assure that issues of captioning (for deaf and hard-of-hearing people) and description (for blind and visually impaired people) are taken into account and are well-understood.
  12. Test solutions and file bugs in databases for each browser to drive the necessary work. Attach relevant test cases and documentation. Make sure the developers know what to fix.

Work Plan for Audio Description


Success Criteria

  1. A complete set of documentation and test cases for captioning and audio descriotion, without unnecessary IP restrictions, is available
  2. Mozilla and should implement the proposed solution for both captioning and audio description, in a manner which maximizes usability. For example, there should be a consistent UI for turning captions on or off, no matter what the video format being used is.
  3. Authoring tools are available which support the solutions
  4. At least one mainstream source of video content on the web (e.g. wikimedia) has some content which supports the proposed solution for captioning and audio description