Accessibility/Video a11y Study08: Difference between revisions

Jump to navigation Jump to search
m
Links; 3.1 -> 3.5; tag -> element; empty paragraphs
m (Links; 3.1 -> 3.5; tag -> element; empty paragraphs)
 
Line 1: Line 1:
< [https://wiki.mozilla.org/Accessibility/Video_Accessibility Video Accessibility]
< [[Accessibility/Video Accessibility|Video Accessibility]]


= Report of the Video A11y Grant Progress 2008 =
= Report of the Video A11y Grant Progress 2008 =
By Silvia Pfeiffer
By Silvia Pfeiffer


The three months are over during which the Mozilla Foundation provided me with a grant towards analysing the status of accessibility for the HTML5 &lt;video> and &lt;audio> elements, particularly with a view towards Ogg support. This post provides a summary of my findings and recommendations on how to progress video a11y in Firefox, as well as a list of actual progress already made. One of the biggest achievements is that there is now a mailing list at Xiph for video a11y and that this community will continue working on these issues, if slowly.
The three months are over during which the Mozilla Foundation provided me with a grant towards analysing the status of accessibility for the HTML5 <code>&lt;video></code> and <code>&lt;audio></code> elements, particularly with a view towards Ogg support. This post provides a summary of my findings and recommendations on how to progress video a11y in Firefox, as well as a list of actual progress already made. One of the biggest achievements is that there is now a mailing list at Xiph for video a11y and that this community will continue working on these issues, if slowly.
 


== Background: Video Accessibility study ==
== Background: Video Accessibility study ==


The study took a broad view on what constitutes "accessibility" for audio and video including and beyond means of providing access to people with disabilities. The study analysed means of generally attaching textual information to audio and video, and enabling search engines with better access to these textual representations. The requirements document is available [https://wiki.mozilla.org/Accessibility/Video_a11y_requirements in the Mozilla wiki].
The study took a broad view on what constitutes "accessibility" for audio and video including and beyond means of providing access to people with disabilities. The study analysed means of generally attaching textual information to audio and video, and enabling search engines with better access to these textual representations. The requirements document is available [[Accessibility/Video a11y requirements|in the Mozilla wiki]].
 
One particular aim of the study was to recommend means for delivering accessibility features inside the Ogg container format for the open Ogg Theora & Vorbis formats. Since Ogg Theora/Vorbis has been adopted by Firefox as the baseline codec for the audio and video tags, Ogg plays a major role when delivering accessibility features into the Web browser. This also goes beyond mere Web delivery of accessibility features and will have an effect on a larger number of media applications. This is important since the creation of accessibility content for audio and video formats cannot just happen for the Web. It needs to be supported by an ecosystem of applications around the audio and video content, including in particular authoring applications, but also off-line playback applications.


One particular aim of the study was to recommend means for delivering accessibility features inside the Ogg container format for the open Ogg Theora & Vorbis formats. Since Ogg Theora/Vorbis has been adopted by Firefox as the baseline codec for the audio and video elements, Ogg plays a major role when delivering accessibility features into the Web browser. This also goes beyond mere Web delivery of accessibility features and will have an effect on a larger number of media applications. This is important since the creation of accessibility content for audio and video formats cannot just happen for the Web. It needs to be supported by an ecosystem of applications around the audio and video content, including in particular authoring applications, but also off-line playback applications.


== Results and Recommendations of the Video Accessibility study ==
== Results and Recommendations of the Video Accessibility study ==


First of all one has to recognise that some accessibility data is in a text format (e.g. closed captions), and others are actually supplementary media data that accompanies the core video or audio content. Examples for such non-text accessibility data are open captions (i.e. captions that are burnt into the video data), bitmap captions (i.e. graphics files that are blended on top of the video data), or audio descriptions (i.e. descriptive spoken annotations that are aligned with pauses in the original audio track). Most non-text accessibility data actually has a textual representation: closed captions for example come as text that can easily be turned on or off by the video player. Also, textual audio descriptions can be rendered by a screen reader or through a braille device.
First of all one has to recognise that some accessibility data is in a text format (e.g. closed captions), and others are actually supplementary media data that accompanies the core video or audio content. Examples for such non-text accessibility data are open captions (i.e. captions that are burnt into the video data), bitmap captions (i.e. graphics files that are blended on top of the video data), or audio descriptions (i.e. descriptive spoken annotations that are aligned with pauses in the original audio track). Most non-text accessibility data actually has a textual representation: closed captions for example come as text that can easily be turned on or off by the video player. Also, textual audio descriptions can be rendered by a screen reader or through a braille device.


=== 1. Text vs Non-Text accessibility data ===
=== 1. Text vs Non-Text accessibility data ===
Line 36: Line 33:


Also please note that we recommend development of a server-side dynamic content adaptation scheme that allows the browser to request - on behalf of its user - adequate accessibility tracks together with the content. This is described in more detail in section 6 below.
Also please note that we recommend development of a server-side dynamic content adaptation scheme that allows the browser to request - on behalf of its user - adequate accessibility tracks together with the content. This is described in more detail in section 6 below.


=== 3. Dealing with Out-of-band Time-aligned Text ===
=== 3. Dealing with Out-of-band Time-aligned Text ===


In the [https://wiki.mozilla.org/Accessibility/Video_a11y_requirements accessibility requirements document; and the [http://wiki.xiph.org/index.php/OggText OggText proposal], a large number of categories of time-aligned text that have been seen in online video and audio applications, were identified:
In the [[Accessibility/Video a11y requirements|accessibility requirements document]]; and the [http://wiki.xiph.org/index.php/OggText OggText proposal], a large number of categories of time-aligned text that have been seen in online video and audio applications, were identified:
* CC: closed captions (for the hearing impaired)
* CC: closed captions (for the hearing impaired)
* SUB: subtitles
* SUB: subtitles
Line 56: Line 52:
For most of these categories, proprietary formats are being used by the companies that support them. Only closed captions and subtitles, as well as lyrics and linguistic transcripts have widely used open text formats.
For most of these categories, proprietary formats are being used by the companies that support them. Only closed captions and subtitles, as well as lyrics and linguistic transcripts have widely used open text formats.


The majority of subtitles and captions that are available on the Internet right now are provided in text files that are separate to the video or audio file, mostly in SubRip .srt or SubViewer .sub files (prepared by the fansubbing community). A few now come as xml files in SMIL, 3GPP TimedText .ttxt, W3C TimedText DFXP or CMML files. Song lyrics come in the Lyrics Displayer .lrc file format. Linguistic transcripts come in the Transcriber .trs file format. Several javascript libraries for creating ticker text from txt files that contain sequences of div tags exist. Many other time-aligned text file formats exist.
The majority of subtitles and captions that are available on the Internet right now are provided in text files that are separate to the video or audio file, mostly in SubRip .srt or SubViewer .sub files (prepared by the fansubbing community). A few now come as xml files in SMIL, 3GPP TimedText .ttxt, W3C TimedText DFXP or CMML files. Song lyrics come in the Lyrics Displayer .lrc file format. Linguistic transcripts come in the Transcriber .trs file format. Several javascript libraries for creating ticker text from txt files that contain sequences of div elements exist. Many other time-aligned text file formats exist.


A typical media player such as mplayer or vlc plays back the subtitles for a video file by allowing the user to open a subtitle file in parallel to the video. The media player then synchronises the playback of the subtitles and renders them on top of the video file. QuickTime and WindowsMediaPlayer do not have this functionality, but rely on subtitle tracks being delivered inside the audio or video file.
A typical media player such as mplayer or vlc plays back the subtitles for a video file by allowing the user to open a subtitle file in parallel to the video. The media player then synchronises the playback of the subtitles and renders them on top of the video file. QuickTime and WindowsMediaPlayer do not have this functionality, but rely on subtitle tracks being delivered inside the audio or video file.
Line 139: Line 135:
An alternative way of encapsulating TDHT into Ogg is to map it to Ogg Kate, which encapsulates all required resources inside an Ogg container to make it a compact format. Ogg Kate is however not Web-friendly, and a display of Ogg Kate in a Web browser involves mapping it back to HTML.
An alternative way of encapsulating TDHT into Ogg is to map it to Ogg Kate, which encapsulates all required resources inside an Ogg container to make it a compact format. Ogg Kate is however not Web-friendly, and a display of Ogg Kate in a Web browser involves mapping it back to HTML.


When looking at which in-line time-aligned text codecs to support in Ogg, one should also look at what existing media players (outside the Web) are able to decode. There is actually support for decoding of Kate and CMML Ogg tracks in typical open source media players like mplayer or vlc. However, not much content has been produced other than test content that uses these formats to provide subtitles, captions, or other time-aligned text. The most uptake that either format has achieved is as an export format for Metavid: http://www.metavid.org/. Mostly, the solutions for using subtitles with Ogg have been to use srt files and have the media players synchronise them.
When looking at which in-line time-aligned text codecs to support in Ogg, one should also look at what existing media players (outside the Web) are able to decode. There is actually support for decoding of Kate and CMML Ogg tracks in typical open source media players like mplayer or vlc. However, not much content has been produced other than test content that uses these formats to provide subtitles, captions, or other time-aligned text. The most uptake that either format has achieved is as an export format for [http://www.metavid.org/ Metavid]. Mostly, the solutions for using subtitles with Ogg have been to use srt files and have the media players synchronise them.


'''Recommendation 6:''' Implement support for srt as a simple format in Ogg, e.g. full OggSRT support. This includes an encoding and decoding library for OggText codecs, implementation of the mapping of srt into OggText, creation of a few srt and OggSRT example files, as well as mapping of srt to HTML, and display in Firefox. Further, there should be transcoding tools for other simple formats to srt. Details of some of these steps need to be worked out.
'''Recommendation 6:''' Implement support for srt as a simple format in Ogg, e.g. full OggSRT support. This includes an encoding and decoding library for OggText codecs, implementation of the mapping of srt into OggText, creation of a few srt and OggSRT example files, as well as mapping of srt to HTML, and display in Firefox. Further, there should be transcoding tools for other simple formats to srt. Details of some of these steps need to be worked out.
Line 146: Line 142:


'''Recommendation 8:''' One aim must be to handle in-line text codecs in the same way as out-of-band time-aligned text files once they hit the Web browser. In analogy to recommendation 3, this will also require a mapping to HTML (and CSS) for Ogg-encapsulated text codecs.
'''Recommendation 8:''' One aim must be to handle in-line text codecs in the same way as out-of-band time-aligned text files once they hit the Web browser. In analogy to recommendation 3, this will also require a mapping to HTML (and CSS) for Ogg-encapsulated text codecs.


=== 6. Content Selection and Adaptation for Time-aligned Text Codecs ===
=== 6. Content Selection and Adaptation for Time-aligned Text Codecs ===
Line 154: Line 149:
The same functionality is required for all the other text categories, for example for the above mentioned sign language and audio annotation tracks.
The same functionality is required for all the other text categories, for example for the above mentioned sign language and audio annotation tracks.


To make this available, the Web browser needs to have the ability to gain a full picture of the tracks available for a video or audio resource before being able to choose the ones requested through the user's preferences. The text tracks may come from the video or audio resource itself, or through "text" tracks inside the "video" tag. To gain information about the text tracks available inside an Ogg file, a browser can start downloading the first few bytes of the Ogg file, which contain the header data and therefore the description of the tracks available inside it. For Ogg, the skeleton track knows which text tracks are available. However, we need to add the category description into skeleton to allow per-category content selection.  
To make this available, the Web browser needs to have the ability to gain a full picture of the tracks available for a video or audio resource before being able to choose the ones requested through the user's preferences. The text tracks may come from the video or audio resource itself, or through "text" tracks inside the "video" element. To gain information about the text tracks available inside an Ogg file, a browser can start downloading the first few bytes of the Ogg file, which contain the header data and therefore the description of the tracks available inside it. For Ogg, the skeleton track knows which text tracks are available. However, we need to add the category description into skeleton to allow per-category content selection.  


In a general audio or video file that is not Ogg, typically downloading the first few bytes will also provide the header data and thus the description of the available tracks. However, an encapsulated text track does not generally expose the category types of the text tracks. The Xiph community has developed [http://wiki.xiph.org/index.php/ROE ROE], the XML-based Rich Open multitrack media Exposition file format. ROE is essentially a textual description of the tracks that are composed inside an Ogg file and can be used generically to describe the tracks available inside any audio or video file. Thus, a Web browser could first download the ROE file associated with an audio or video resource to gain a full picture of all the available content tracks. It can then decide which ones to display.
In a general audio or video file that is not Ogg, typically downloading the first few bytes will also provide the header data and thus the description of the available tracks. However, an encapsulated text track does not generally expose the category types of the text tracks. The Xiph community has developed [http://wiki.xiph.org/index.php/ROE ROE], the XML-based Rich Open multitrack media Exposition file format. ROE is essentially a textual description of the tracks that are composed inside an Ogg file and can be used generically to describe the tracks available inside any audio or video file. Thus, a Web browser could first download the ROE file associated with an audio or video resource to gain a full picture of all the available content tracks. It can then decide which ones to display.
Line 165: Line 160:


'''Recommendation 11:''' Experiment with and develop a specification for server-side content adaptation for in-line, out-of-band, and mixed time-aligned text tracks.
'''Recommendation 11:''' Experiment with and develop a specification for server-side content adaptation for in-line, out-of-band, and mixed time-aligned text tracks.


== The Road Ahead ==
== The Road Ahead ==
Line 205: Line 199:
* http://wiki.xiph.org/index.php/Timed_Divs_HTML: a proposal for a encompassing format
* http://wiki.xiph.org/index.php/Timed_Divs_HTML: a proposal for a encompassing format
* http://www.linux.com/feature/149988: article announcing the video a11y work
* http://www.linux.com/feature/149988: article announcing the video a11y work
* http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/: a blog post explaining how to attach subtitles out-of-band to the HTML5 video tag
* http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/: a blog post explaining how to attach subtitles out-of-band to the HTML5 video element
* http://blog.gingertech.net/2008/11/19/embedding-time-aligned-text-into-ogg/: a blog post explaining OggText
* http://blog.gingertech.net/2008/11/19/embedding-time-aligned-text-into-ogg/: a blog post explaining OggText


The following implementations related to video accessibility are available:
The following implementations related to video accessibility are available:


* http://v2v.cc/~j/jquery.srt/ : javascript for captions in HTML5 (only works in Firefox 3.1)
* http://v2v.cc/~j/jquery.srt/ : javascript for captions in HTML5 (only works in Firefox 3.5)
* http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php : Wikipedia implementation of out-of-band HTML5 video tag proposal
* http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php : Wikipedia implementation of out-of-band HTML5 video element proposal
* http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html : W3C Timed Text working group implementation of DFXP in HTML5 (only works in Firefox 3.1)
* http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html : W3C Timed Text working group implementation of DFXP in HTML5 (only works in Firefox 3.5)
* a first, as yet unpublished implementation of OggText
* a first, as yet unpublished implementation of OggText


Line 224: Line 218:


* http://www.foms-workshop.org/foms2009/pmwiki.php/Main/TimedText: Foundations of Open Media Software workshop, January 2009 - involved Opera into the discussion
* http://www.foms-workshop.org/foms2009/pmwiki.php/Main/TimedText: Foundations of Open Media Software workshop, January 2009 - involved Opera into the discussion


== Conclusions ==
== Conclusions ==
Confirmed users
97

edits

Navigation menu