Accessibility/Video Codecs
Ogg Background
Ogg is a container format for time-continuous data, which interleaves audio and video tracks and any other time-continuously sampled data stream into one flat file.
Ogg is being developed by Xiph.org. Xiph.org have not made a decision yet on how subtitles and captions are best to be supported inside Ogg.
Another project called OGM actually branched the Ogg codebase and implemented support for a wide range of proprietary and non-free codecs. For example, OGM files most often carry video encoded in the MPEG-4 ASP format and audio in Vorbis or AC-3 together with subtitles in SRT, SSA or VobSub format.
Ogg itself supports CMML, the Continuous Media Markup Language developed by the proposed grantee which is more HTML-like and includes support for hyperlinks. It has been considered to be used for captions and subtitles, but the specifications are still in development. It has to be re-assessed under the current accessibility requirements whether CMML or an extended version of CMML is a useful solution to our problems. A very interesting implementation and use of many of the CMML/Annodex ideas is MetavidWiki (see http://metavid.ucsc.edu/), which is an open source wiki-style social annotation authoring tool.
Further, Ogg recently added Kate, a codec for karaoke with animated text and images. Kate defines its own file format for specifying karaoke and animations. Experience from the implementation of Kate needs to be included in an accessibility solution for Ogg. We may even work together with the Kate developer to solve this larger problem.
There is currently no implementation of W3C Timed Text (TT) for Ogg or of any of the other subtitling formats.
-- What is involved with creating a caption format for Ogg ==
Definition of inclusion of a text stream such as W3C TT into Ogg requires a so-called media mapping:
- definition of the format of the codec bitstream, i.e. what packets of data are being encoded and how do they fit into Ogg pages
- definition of the format of the codec header pages
This enables multiplexing of the data stream into the Ogg container.
At this stage we are not clear which format to use. The solution may be found in one format that supports all the requirements, or in multiple formats with the provisioning of a framework for how to implement and interrelate these formats with each other. We may potentially need to cover new ground, create a new specification and promote it into the different relevant communities. Or the best solution may be to support one of the existing subtitling formats and extend it to cover other accessibility needs.
Also, there needs to be a recommendation for how to display the different text codecs on screen, such that a standard means of display for the different types is achieved. E.g. audio annotations are not to be displayed as text, but rather be rendered through a text-to-speech, etc.
As for the implementation of support for a new text codec, there is a whole swag of software to be extended.
To enable support for a text codec in Firefox requires an extension to liboggz, liboggplay and to the Mozilla adaptation code. The adaption code is in the implementation of the WHATWG HTMLMediaElement, HTMLAudioElement and HTMLVideoElement. Basically, the implementation of those DOM objects uses liboggplay functions to decode the data and get the video and audio data. So liboggplay would need functions to extract the decoded caption information. As well as the liboggplay support there would need to be some way of getting the information from the web developer side. That is, DOM methods or events. Implementing those would use the liboggplay functionality that would be added.
To enable support for Desktop use requires support in the OggDSF DirectShow filter, in the XiphQT Quicktime components, in mplayer, vlc, gstreamer, phonon, and xine. Support of these media frameworks has a follow-on effect in that it also creates support for video players, video applications, and Web Browsers that rely on native platform media frameworks to decode video such as Safari.
To enable authoring requires support for ffmpeg, ffmpeg2theora, and further GUI authoring applications to be determined.