Accessibility/Video a11y Study08: Difference between revisions

Accessibility/Video a11y Study08 (view source)

Revision as of 01:51, 6 February 2009

17 bytes removed , 6 February 2009

→‎Report of the Video A11y Grant Progress 2008

Johnf

1

edit

@@ Line 2: / Line 2: @@
 By Silvia Pfeiffer
-The three months are over during which the Mozilla Foundation provided me with a grant towards analysing the situation of accessibility for the HTML5 &lt;video> and &lt;audio> elements, particularly with a view towards Ogg support. This post provides a summary of my findings and recommendations on how to progress video a11y in Firefox, as well as a list of actual progress made already. One of the biggest achievements is that there is now a mailing list at Xiph for video a11y and that this community will continue working on these issues, if slowly.
+The three months are over during which the Mozilla Foundation provided me with a grant towards analysing the status of accessibility for the HTML5 &lt;video> and &lt;audio> elements, particularly with a view towards Ogg support. This post provides a summary of my findings and recommendations on how to progress video a11y in Firefox, as well as a list of actual progress already made. One of the biggest achievements is that there is now a mailing list at Xiph for video a11y and that this community will continue working on these issues, if slowly.
 == Background: Video Accessibility study ==
-The study took a broad view on what constitutes "accessibility" for audio and video including and beyond means of providing access to people with disabilities. The study analysed means of generally attaching textual information to audio and video, and enabling search engines with better access to these text representations. The requirements document is available [https://wiki.mozilla.org/Accessibility/Video_a11y_requirements in the Mozilla wiki].
+The study took a broad view on what constitutes "accessibility" for audio and video including and beyond means of providing access to people with disabilities. The study analysed means of generally attaching textual information to audio and video, and enabling search engines with better access to these textual representations. The requirements document is available [https://wiki.mozilla.org/Accessibility/Video_a11y_requirements in the Mozilla wiki].
 One particular aim of the study was to recommend means for delivering accessibility features inside the Ogg container format for the open Ogg Theora & Vorbis formats. Since Ogg Theora/Vorbis has been adopted by Firefox as the baseline codec for the audio and video tags, Ogg plays a major role when delivering accessibility features into the Web browser. This also goes beyond mere Web delivery of accessibility features and will have an effect on a larger number of media applications. This is important since the creation of accessibility content for audio and video formats cannot just happen for the Web. It needs to be supported by an ecosystem of applications around the audio and video content, including in particular authoring applications, but also off-line playback applications.
-== Results & Recommendations of the Video Accessibility study ==
+== Results and Recommendations of the Video Accessibility study ==
-First of all one has to recognize that some accessibility data is in text format (e.g. closed captions), and others is actually supplementary media data that accompanies the core video or audio content. Examples for such non-text accessibility data are open captions (i.e. captions that are burnt into the video data), bitmap captions (i.e. graphics files that are blended on top of the video data), or audio descriptions (i.e. descriptive spoken annotations that are aligned with pauses in the original audio track). Most non-text accessibility data actually has a textual representation: closed captions for example come as text that can easily be turned on or off by the video player. Also, textual audio descriptions can be rendered by a screen reader or through a braille device.
+First of all one has to recognise that some accessibility data is in a text format (e.g. closed captions), and others are actually supplementary media data that accompanies the core video or audio content. Examples for such non-text accessibility data are open captions (i.e. captions that are burnt into the video data), bitmap captions (i.e. graphics files that are blended on top of the video data), or audio descriptions (i.e. descriptive spoken annotations that are aligned with pauses in the original audio track). Most non-text accessibility data actually has a textual representation: closed captions for example come as text that can easily be turned on or off by the video player. Also, textual audio descriptions can be rendered by a screen reader or through a braille device.
@@ Line 25: / Line 25: @@
 === 2. Dealing with Non-Text accessibility data ===
-It needs to be understood that sign language is for deaf people often their first language, while transcribed spoken speech is often their first foreign language. Similarly, listening to natural speech in audio descriptions is a lot more relaxing than listening to screen readers or reading braille for blind people.  Also, there is already a large amount of audio and video accessibility data available that is not in textual format, e.g. the bitmaps used for captions on DVDs. It would be a shame to exclude such data from being used on the Web.
+It needs to be understood that sign language is, for the hearing impaired, often their first language, while transcribed spoken speech is often their first foreign language. Similarly, listening to natural speech in audio descriptions is a lot more relaxing than listening to screen readers or reading braille for the vision impaired.  Also, there is already a large amount of audio and video accessibility data available that is not in textual format, e.g. the bitmaps used for captions on DVDs. It would be a shame to exclude such data from being used on the Web.
-Considering these circumstances, it is really important to enable the association of non-text captions, sign language and audio annotations with video.
+Considering these circumstances, it is critical to enable the association of non-text captions, sign language and audio annotations with video.
-Existing sign language video or audio descriptions usually come as part of the video - either directly part of the given audio or video track by being burnt-in (e.g. picture-in-picture sign language video, or open captions), or as a separate track. QuickTime, mpeg2 or mpeg4 are container formats that are typically used to encapsulate such extra tracks with the original audio or video file. Ogg is capable of the same multi-track encapsulation and provides synchronisation between the tracks. The Ogg skeleton headers can further provide a clear indication of the available tracks inside an Ogg file, which can be used to enable media players to offer audio and video track selection to a user.
+Existing sign language video or audio descriptions usually come as part of the video - either directly part of the given audio or video track by being burnt-in (e.g. picture-in-picture sign language video, or open captions), or as a separate track. QuickTime, MPEG2 or MPEG4 are container formats that are typically used to encapsulate such extra tracks with the original audio or video file. Ogg is capable of the same multi-track encapsulation and provides synchronisation between the tracks. The Ogg skeleton headers can further provide a clear indication of the available tracks inside an Ogg file, which can be used to enable media players to offer audio and video track selection to a user.
 '''Recommendation 2:''' Non-text accessibility data, such as spoken audio descriptions, should be multiplexed into the Ogg container format, where media players (including Web browsers) will be able to identify them and offer them to users for decoding. It is further recommended that speech-only accessibility tracks should be encoded using Speex, while video should be encoded using Theora. With graphics, there is currently no clearly recommendable codec in Xiph - probably the best one to use is Ogg Kate or Theora, but OggMNG or OggSpots are options, too. Note that the Xiph community may develop and recommend more appropriate codecs for time-aligned graphics in the future.
-Also please note that we recommend development of a server-side dynamic content adaptation scheme that allows the browser to request - on behalf of his user - adequate accessibility tracks together with the content. This is described in more detail in section 6 below.
+Also please note that we recommend development of a server-side dynamic content adaptation scheme that allows the browser to request - on behalf of its user - adequate accessibility tracks together with the content. This is described in more detail in section 6 below.
@@ Line 39: / Line 39: @@
 In the [https://wiki.mozilla.org/Accessibility/Video_a11y_requirements accessibility requirements document; and the [http://wiki.xiph.org/index.php/OggText OggText proposal], a large number of categories of time-aligned text that have been seen in online video and audio applications, were identified:
-* CC: closed captions (for the deaf)
+* CC: closed captions (for the hearing impaired)
 * SUB: subtitles
-* TAD: textual audio descriptions (for the blind; to be used as braille or through TTS)
+* TAD: textual audio descriptions (for the vision impaired; to be used as braille or through TTS)
 * KTV: karaoke
 * TIK: ticker text
@@ Line 58: / Line 58: @@
 A typical media player such as mplayer or vlc plays back the subtitles for a video file by allowing the user to open a subtitle file in parallel to the video. The media player then synchronises the playback of the subtitles and renders them on top of the video file. QuickTime and WindowsMediaPlayer do not have this functionality, but rely on subtitle tracks being delivered inside the audio or video file.
-The ability to dynamically associate a time-aligned text file with an audio or video file at the moment of playback is very powerful. It has allowed the creation of a whole community of subtitling fans, the fansubbers, which provides accessibility to almost all movies and feature files.
+The ability to dynamically associate a time-aligned text file with an audio or video file at the moment of playback is very powerful. It has allowed the creation of a whole community of subtitling fans, the fansubbers, which provides accessibility to almost all movie and feature files.
 To provide such a functionality inside a Web browser, it is necessary to specify out-of-band time-aligned text files with the video.
@@ Line 67: / Line 67: @@
 <video src="http://example.com/video.ogv" controls>
    <itext category="CC" lang="en" src="caption.srt" style=""></itext>
-   <itext category="SUB" lang="fr" src="translation_webservice/fr/caption.srt" style="valign: top;"></itext>
+   <itext category="SUB" lang="fr" src="http://translation_webservice/fr/caption.srt" style="valign: top;"></itext>
 </video>
 </pre>
-How this will actually work is as yet unclear. One approach is to render the out-of-band text files as HTML straight into the DOM of the current Web page. This raises security issues. Another approach is to render it into a kind of iframe. Also, SVG has a "text" element that serves a similar purpose, so the specification could be aligned with that.
+How this will actually work is as yet unclear. One approach is to render the out-of-band text files as HTML straight into the DOM of the current Web page. This raises security issues. Another approach is to render it into a kind of iframe i.e. a separate security context. Also, SVG has a "text" element that serves a similar purpose, so the specification could be aligned with that.
 There are experimental implementations in javascript of this proposal, one for srt through [http://metavid.org/w/extensions/MetavidWiki/skins/mv_embed/example_usage/sample_timed_text.php Wikipedia] and one for dfxp through the [http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html W3C TimedText working group]. Both map the respective out-of-band file into the DOM of the current Web page.
@@ Line 84: / Line 84: @@
 Seeing the number of text categories identified above, it would make sense to have only one time-aligned text file format that can produce them all and is flexible enough to allow even further new time-aligned text ideas to be realised.
-Of all the formats that were analysed, DFXP is the most flexible format and is capable of producing multiple categories. It still needs to be checked, whether it would be possible for all of the given categories to be supported by DFXP, since DFXP is developed as an exchange format for subtitles and captions in particular. In any case, DFXP is not optimal for a Web-based time-aligned text format, since it redefines a lot of HTML, SMIL and CSS constructs, rather than re-using existing HTML, javascript and CSS. The re-definition was necessary because DFXP was developed as a generic format for time-aligned text, that needs to work in any situation, including outside the Web. However, for purposes of the Web (and for many Web-capable media players), reuse of HTML, CSS and javascript would lead to a format for which it would be much easier to provide implementations.
+Of all the formats that were analysed, DFXP is the most flexible format and is capable of producing multiple categories. It still needs to be determined, whether it would be possible for all of the given categories to be supported by DFXP, since DFXP is developed as an exchange format for subtitles and captions in particular. In any case, DFXP is not optimal for a Web-based time-aligned text format, since it redefines a lot of HTML, SMIL and CSS constructs, rather than re-using existing HTML, javascript and CSS. The re-definition was necessary because DFXP was developed as a generic format for time-aligned text, that needs to work in any situation, including outside the Web. However, for purposes of the Web (and for many Web-capable media players), reuse of HTML, CSS and javascript would lead to a format for which it would be much easier to provide implementations.
 An idea for such a format is being discussed in the Xiph community. It runs under the name of TDHT (timed divs in HTML). A simple example file looks like this:
@@ Line 112: / Line 112: @@
 This format tries to incorporate what we learnt from analysing the needs for existing time-aligned text requirements, while making implementation easy in a Web-friendly environment, since it is a normal HTML file with minimal changes.
-One concern that has been raised with TDHT is that HTML may be too comprehensive a format for the needs of time-aligned text. Time-aligned text is predominantly text that should be stylable by the Website that is using it. HTML however is pretty bad at separating data from styling, unless CSS is used exclusively. It may be better to create a format that is more simlar to RSS than to HTML in its simplicity.
+One concern that has been raised with TDHT is that HTML may be too comprehensive a format for the needs of time-aligned text. Time-aligned text is predominantly text that should be stylable by the Website that is using it. HTML however is notoriously bad at separating data from styling, unless CSS is used exclusively. It may be better to create a format that is simlar to RSS than to HTML in its simplicity.
-TDHT and DFXP are current solutions for out-of-band time-aligned text. Other options for such comprehensive time-aligned text solutions need to be analysed and experimented with.
+TDHT and DFXP are the current solutions for out-of-band time-aligned text. Other options for such comprehensive time-aligned text solutions need to be analysed and experimented with.
 '''Recommendation 4:''' WRT DFXP: analyse whether DFXP is capable of supporting all the identified text categories by creating a collection of test files. This will help understand the capabilities and limitations of DFXP better.
@@ Line 125: / Line 125: @@
 As is the case with non-text accessibility data, time-aligned text can also be encoded into media files. The advantage is that the text representation of the video is actually part of the video file and thus this meta data doesn't get lost when sharing the files further. Also, synchronisation between the media data and the text codecs is a given.
-Ogg, QuickTime, Flash, mpeg4, and 3gpp are containers that are typically used to encapsulate such extra tracks with the original audio-visual file. All of these are capable of encapsulating 3GPP TimedText, which is a subpart of DFXP.
+Ogg, QuickTime, FLV, MPEG4, and 3GPP are containers that are typically used to encapsulate such extra tracks with the original audio-visual file. All of these are capable of encapsulating 3GPP TimedText, which is a subpart of DFXP.
 Ogg currently supports CMML and Kate as text codecs.
-Just like out-of-band time-aligned text come in multiple formats, in-line text codecs do, too. This study motivated the Xiph community to define a framework for mapping any type of text codec into Ogg through the so-called [http://wiki.xiph.org/index.php/OggText OggText mapping]. A first implementation of this format exists for encapsulating srt files into Ogg.
+Just like out-of-band time-aligned text comes in multiple formats, in-line text codecs do, too. This study motivated the Xiph community to define a framework for mapping any type of text codec into Ogg through the so-called [http://wiki.xiph.org/index.php/OggText OggText mapping]. A first implementation of this format exists for encapsulating srt files into Ogg.
-In a Web framework (such as Firefox) it doesn't sense to support all possible in-line text codecs. Instead, it is useful to only support one comprehensive format, or at most a simple format (like srt) and a comprehensive format (like TDHT or Kate). Then, an existing format can be transcoded to one of these two formats for in-line encoding and Web delivery. For example, DFXP could be mapped to TDHT before encapsulation into Ogg.
+In a Web framework (such as Firefox) it doesn't make sense to support all possible in-line text codecs. Instead, it is useful to only support one comprehensive format, or at most a simple format (like srt) and a comprehensive format (like TDHT or Kate). Then, an existing format can be transcoded to one of these two formats for in-line encoding and Web delivery. For example, DFXP could be mapped to TDHT before encapsulation into Ogg.
 A mapping of TDHT into Ogg is being defined in the Xiph community as OggTDHT. It is a simple extension of OggText and provides a generic in-line time-aligned text format. One has to be aware though that some resources that are required to render TDHT, such as images, javascript files, or CSS files, would not be encapsulated into an Ogg TDHT track, but would either continue to exist out-of-band or would require to be encapsulated in separate tracks.
@@ Line 137: / Line 137: @@
 An alternative way of encapsulating TDHT into Ogg is to map it to Ogg Kate, which encapsulates all required resources inside an Ogg container to make it a compact format. Ogg Kate is however not Web-friendly, and a display of Ogg Kate in a Web browser involves mapping it back to HTML.
-When looking at what in-line time-aligned text codecs to support in Ogg, one should also look at what existing media players (outside the Web) are able to decode. There is actually support for decoding of Kate and CMML Ogg tracks in typical open source media players like mplayer or vlc. However, not much content has been produced other than test content that uses these formats to provide subtitles, captions, or other time-aligned text. The most uptake that either format has achieved is as an export format for Metavid: http://www.metavid.org/. Mostly, the solutions for using subtitles with Ogg has been to use srt files and have the media players synchronise them.
+When looking at which in-line time-aligned text codecs to support in Ogg, one should also look at what existing media players (outside the Web) are able to decode. There is actually support for decoding of Kate and CMML Ogg tracks in typical open source media players like mplayer or vlc. However, not much content has been produced other than test content that uses these formats to provide subtitles, captions, or other time-aligned text. The most uptake that either format has achieved is as an export format for Metavid: http://www.metavid.org/. Mostly, the solutions for using subtitles with Ogg have been to use srt files and have the media players synchronise them.
 '''Recommendation 6:''' Implement support for srt as a simple format in Ogg, e.g. full OggSRT support. This includes an encoding and decoding library for OggText codecs, implementation of the mapping of srt into OggText, creation of a few srt and OggSRT example files, as well as mapping of srt to HTML, and display in Firefox. Further, there should be transcoding tools for other simple formats to srt. Details of some of these steps need to be worked out.
@@ Line 152: / Line 152: @@
 The same functionality is required for all the other text categories, for example for the above mentioned sign language and audio annotation tracks.
 To make this available, the Web browser needs to have the ability to gain a full picture of the tracks available for a video or audio resource before being able to choose the ones requested through the user's preferences. The text tracks may come from the video or audio resource itself, or through "text" tracks inside the "video" tag. To gain information about the text tracks available inside an Ogg file, a browser can start downloading the first few bytes of the Ogg file, which contain the header data and therefore the description of the tracks available inside it. For Ogg, the skeleton track knows which text tracks are available. However, we need to add the category description into skeleton to allow per-category content selection.
 In a general audio or video file that is not Ogg, typically downloading the first few bytes will also provide the header data and thus the description of the available tracks. However, an encapsulated text track does not generally expose the category types of the text tracks. The Xiph community has developed [http://wiki.xiph.org/index.php/ROE ROE], the XML-based Rich Open multitrack media Exposition file format. ROE is essentially a textual description of the tracks that are composed inside an Ogg file and can be used generically to describe the tracks available inside any audio or video file. Thus, a Web browser could first download the ROE file associated with an audio or video resource to gain a full picture of all the available content tracks. It can then decide which ones to display.
@@ Line 187: / Line 187: @@
 == Achievements ==
-During the study, some progress was already made towards the four areas of work, in particular towards the wecond and third.
+During the study, some progress was already made towards the four areas of work, in particular towards the second and third.
 Here is a list of the documents that have been created as a result of the video accessibility study:
@@ Line 219: / Line 219: @@
 == Conclusions ==
-The aim of the study was to "deliver a well thought-out recommendation for how to support the different types of accessibility needs for audio and video, including a speciﬁcation of the actual ﬁle format(s) to use. At minimum, a Ogg solution for captions and subtitles is expected, and a means towards including sign language video tracks, audio annotations, transcripts, scripts, story boards, karaoke, metadata, and semantic annotations is proposed."
+The aim of the study was to "deliver a well thought-out recommendation for how to support the different types of accessibility needs for audio and video, including a specification of the actual file format(s) to use. At minimum, an Ogg solution for captions and subtitles is expected, and a means towards including sign language video tracks, audio annotations, transcripts, scripts, story boards, karaoke, metadata, and semantic annotations is proposed."
 This aim of the grant proposal was achieved with great success. In fact, we have gone beyond this aim and created a community at Ogg to continue addressing these issues. And we have gone far beyond a recommendation by also creating initial specifications that address each of the four identified areas of work, in particular:
-* for how to include subtitles into Web pages with a &lt;video> element,
+* how to include subtitles into Web pages with a &lt;video> element,
-* for how to encapsulate time-aligned text into Ogg, and
+* how to encapsulate time-aligned text into Ogg, and
 * a format for the richer time-aligned text data.
 In the next step, Mozilla should look at implementing srt support into the Web browser, and in parallel further analyse the richer time-aligned text categories and their needs. In collaboration with the Xiph community, srt in Ogg should also be addressed.
-Given all this, the grant was a great success and the resulting study points us the way forward.

Accessibility/Video a11y Study08: Difference between revisions

Accessibility/Video a11y Study08 (view source)

Revision as of 01:51, 6 February 2009

Navigation menu

Search