Accessibility/Video Text Format: Difference between revisions

Accessibility/Video Text Format (view source)

Revision as of 07:01, 5 August 2010

1,546 bytes added , 5 August 2010

removed flexibility in <cue>s, added JSON example, some edits

Silviapfeiffer

401

edits

@@ Line 1: / Line 1: @@
 == A HTML-based media markup language for HTML5 ==
-This page introduces a HTML-based time-aligned text markup for audio and video. It is particularly targeted for use with HTML5 audio and video elements, but can be used in stand-alone applications.
+This page introduces a HTML-based time-aligned (or time-synchronized) text markup for audio and video. It is particularly targeted for use with HTML5 audio and video elements, but can be used in stand-alone applications.
 The new markup is called "Web Media Markup Language" (WMML) and has a mime type of text/wmml.
-The main motivation for creating this markup is to create a text format for specifying captions, subtitles, karaoke, and similar time-aligned text which work by reusing existing Web technologies such as CSS and HTML. It does so by creating a new file format, but re-using existing HTML5 elements that are appropriate. Only a small number of elements are introduced that do not currently exist in HTML5.
+The main motivation for creating this markup is to create a text format for specifying captions, subtitles, karaoke, and similar time-aligned text which work by reusing CSS and HTML. It does so by creating a new file format, but re-using existing HTML5 elements that are appropriate. In particular the innerHTML parser of HTML5 will be reused for the main markup. Only a small number of elements are introduced that do not currently exist in HTML5.
 The new elements are not an extension to HTML5 and are not planned to be. There are hooks into HTML through the [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-tracks TimedTracks] API in HTML5 by which the WMML elements are exposed to the Web page that bears the media resource and the link to the WMML document. Some of these HTML5 APIs are only objects in HTML5, but are actual elements in WMML.
-The aim behind this way of defining WMML is to create a format that can reuse existing HTML5 snippet parsing, rather than having to invent a completely new parsing model. A WMML parser will only consist of a small amount of new parsing code and rely on an existing HTML5 snippet parser to provide for the bulk of its parsing needs. Also, the reuse of CSS will allow reuse of existing implementations for styling and positioning. This should vastly help Web browsers to implement support for WMML, even and particularly including the richer features.
+The aim behind this way of defining WMML is to create a format that can reuse existing HTML5 snippet parsing, rather than having to implement a completely new parser. A WMML parser will only consist of a small amount of new parsing code and rely on an existing HTML5 snippet parser to provide for the bulk of its parsing needs. Also, the reuse of CSS will allow reuse of existing implementations for styling and positioning. This should vastly help Web browsers to implement support for WMML, even and particularly including the richer features.
-Note: A WMML document is a non-HTML document that contains HTML elements but is not an XML-with-namespaces document. This is on purpose so as to allow reuse of HTML snippet parsing.
+Note: A WMML document is a xml-ish document that contains HTML elements but is not an XML-with-namespaces document. This is on purpose to allow reuse of CSS and HTML snippet parsing without taking on the issues of XML namespaces, and XSL-FO.
-Also note that as it stands WMML is not XML conformant, since it has some relaxed parsing rules (e.g. <cue> doesn't have to be explicitly closed - it will be implicitly closed by the next <cue> element). Stricter parsing rules could however be introduced.
@@ Line 34: / Line 32: @@
 HTML5 defines a [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-track-api timed track API] for cues and the list of cues inside a WMML document maps neatly onto the TimedTrackCueList interface.
-If not given otherwise, the default rendering region for a WMML resource that is related to a video is the video viewport, and inside that the bottom part. Alternatively, the top, right, and left viewport regions are possible rendering regions, too. Further, the cues could be rendered by a Web page outside the video element, but such information is decided by the rendering Web page and not the WMML file itself. The Web page's setting will also always overrule any settings provided in the WMML file.
+If not given otherwise, the default rendering region for a WMML resource that is related to a video is a CSS box with the dimensions of the video viewport, overlayed on the video viewport, and inside that the bottom part. Alternatively, the top, right, and left viewport regions are possible rendering regions, too. Further, the cues could be rendered by a Web page outside the video element, but such information is decided by the rendering Web page and not the WMML file itself. The Web page's setting will always overrule any settings provided in the WMML file.
 In this example, the cues are rendered at 10s and 20s as an overlay onto the bottom area of the video viewport.
@@ Line 41: / Line 39: @@
 . A formatted and positioned example
-This is an example with two 10 sec long text cues provided in the default language "en-US" which are placed at the top third of the video.
+This is an example with two 10 sec long text cues provided in the default language "en-US" which are placed at the center of top third of the video.
 <pre>
 <!DOCTYPE wmml>
-<wmml lang="en-US">
+<wmml lang="en-US" profile="innerHTML">
    <head>
      <style type="text/css">
@@ Line 76: / Line 74: @@
 The cue elements c1 and c2 are formatted - the first one with red color, a different font, and a background transparency of 50%. The second one has spans that are italicised.
+The Web page could decide to overrule the rendering target to some other location on screen. This would be provided in the style element of the <track> element through which the WMML resource is linked.
 === The elements of WMML ===
@@ Line 86: / Line 87: @@
 * supports [http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#global-attributes global attributes] that <html> supports, too
 * additionally supports the following attribute:
-** kind: the kind of track that this document provides, see [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#attr-track-kind HTML5 kind attribute]
+** kind: the semantic kind of track that this document provides, see [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#attr-track-kind HTML5 kind attribute]
+** lang: the language in which this document is provided
+** profile: specifies the format used in the cues and thus the parser that should be used; values include "plainText", "innerHTML", "JSON", "any" (other formats can be developed); default is "innerHTML"
 ==== the &lt;head> element ====
@@ Line 103: / Line 106: @@
 * is analogous to the [http://www.whatwg.org/specs/web-apps/current-work/multipage/grouping-content.html#the-div-element HTML &lt;div> element] and supports all of the attributes and content elements of &lt;div>, in particular all [http://www.whatwg.org/specs/web-apps/current-work/multipage/content-models.html#flow-content flow content] (which includes &lt;ruby>).
-* <cue> elements cannot appear inside <cue> elements; it is possible to introduce a parsing rule for <cue> that is similar to &lt;dl>/&lt;dd> where the opening of a new <cue> element implicitly closes the previous one. In this way, it is impossible to write a nested <cue> element - the parser always turns it into something non-nested.
+* what actually is used inside a cue is defined by the @profile attribute of the <wmml> element
+** "plainText": will be parsed by ignoring all markup if any
+** "innerHTML": will be parsed by the HTML5 snippet parser
+** "JSON": will be parsed as JSON
+** "any": will not be parsed but just regarded as any text
+* <cue> elements cannot appear inside <cue> elements
 * it has the following additional attributes:
 ** start (float, optional): the start time of the cue (in relation to a media resource that is externally specified in a HTML media element); if missing, start=0 is assumed
 ** end (float, optional): the end time of the cue; if missing, it implicitly ends with the start of the next cue or at the end of the resource; thus, if time-overlapping cues are needed, specification of the end attribute is required
-** voice (optional): a string identifying the voice with which the cue is associated (as defined in [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-track-cue-voice-identifier the HTML5 specification]
 ** width/height: per cue width/height in %
@@ Line 128: / Line 135: @@
 The [http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#selectors HTML element selectors as introduced for HTML5] are also applicable here.
-With the use of attributes, CSS selectors can be applied e.g. to all cues that belong to a certain speaker, like this: cue[voice="speaker1"] { ... } .
+With the use of attributes, CSS selectors can be applied e.g. to all cues that belong to a certain speaker, like this: cue[class="speaker1"] { ... } .
 === Rendering ===
-The WMML file's &lt;cue> elements are not rendered into an existing HTML page, but rather a WMML file creates its own iframe-like new nested browsing context. It is linked to the parent HTML page through a [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-track-element track element] that is inserted as a child of the video element. Creation of a nested browsing context is important because a WMML file can come from a different URI than the Web page and thus for security reasons and for general base URI computations a nested browsing context is the better approach with the DOM nodes of the hosting page and the DOM nodes of the WMML document in different owner documents. That way, the hosting document has the security origin of its own URL and the WMML document has the security origin of its URL.
+The WMML file's &lt;cue> elements are not rendered into an existing HTML page, but rather a WMML file creates its own iframe-like new nested browsing context. It is linked to the parent HTML page through a [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-track-element track element] that is inserted as a child of the video element. Creation of a nested browsing context is important because a WMML file can come from a different domain than the Web page and thus for security reasons and for general base URI computations a nested browsing context is the better approach with the DOM nodes of the hosting page and the DOM nodes of the WMML document in different owner documents. That way, the hosting document has the security origin of its own URL and the WMML document has the security origin of its URL.
-As the browser plays the video, it must render the WMML &lt;cue> tags in sync. As the start time of a <cue> tag is reached, the <cue> tag is made activate, and it is made inactive as the <cue> tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, the cue ends with the start of the next one or at the end of the resource.
+As the browser plays the video, it must render the WMML &lt;cue> tags in sync. As the start time of a <cue> tag is reached, the <cue> tag is made active, and it is made inactive as the <cue> tag's end time is reached. If no start time is given, the start is assumed to be 0, and if no end time is given, the cue ends with the start of the next one or at the end of the resource.
-The content of WMML cue elements is made available to the HTML page that includes the WMML file and the media resource through the [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-track-api timed track API in HTML]. In particular, the getCueAsHTML and getCueAsSource API calls will provide a copy of the DOM subtree for the <cue>. You lose style information that was being applied by <style> elements in the WMML document, but since the main reason for the JavaScript API is to run your own styles, this is acceptable. The returned content needs to be sanitized in case a malicious cue contains a <script> element.
+The content of WMML cue elements is made available to the HTML page that includes the WMML file and the media resource through the [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-track-api timed track API in HTML]. In particular, the getCueAsHTML and getCueAsSource API calls will provide a copy of the DOM subtree for the <cue>. You lose style information that was being applied by <style> elements in the WMML document, but since the main reason for the JavaScript API is to run your own styles, this is acceptable. The returned content may need to be sanitized in case a malicious cue contains a <script> element.
 === Concrete Examples ===
@@ Line 146: / Line 153: @@
 <pre>
 <!DOCTYPE wmml>
-<wmml lang="en-US" kind="subtitles">
+<wmml lang="en-US" kind="subtitles" profile="plainText">
    <cuelist>
      <cue start="00:00:15.00" end="00:00:17.95">At the left we can see...</cue>
@@ Line 160: / Line 167: @@
 <pre>
 <!DOCTYPE wmml>
-<wmml lang="de_DE" kind="subtitles">
+<wmml lang="de_DE" kind="subtitles" profile="innerHTML">
    <head>
      <style type="text/css">
@@ Line 177: / Line 184: @@
      <cue id="c1" start="00:00:15.00" end="00:00:17.95">Auf der <i>linken</i> Seite sehen wir...</cue>
      <cue id="c2" start="00:00:18.16" end="00:00:20.08">Auf der <b>rechten</b> Seite sehen wir die....</cue>
-     <cue id="c3" start="00:00:20.11" end="00:00:21.96" style="color: red;">...die <a href="http://orange.blender.org/blog/creative-commons-license-2/">Enthaupter</a>.</cue>
+     <cue id="c3" start="00:00:20.11" end="00:00:21.96" style="color: red;">...die <a href="http://members.chello.nl/j.kassenaar/elephantsdream/subtitles.html">Enthaupter</a>.</cue>
      <cue id="c4" start="00:00:21.99" end="00:00:24.36">Alles ist <mark>sicher</mark>.&lt;br/>Vollkommen sicher.</cue>
    </cuelist>
@@ Line 209: / Line 216: @@
 <!DOCTYPE wmml>
 <wmml lang="en-US" kind="captions">
+  <head>
+    cue.Proog {
+      font-style: italic;
+    }
+  </head>
    <cuelist>
-     <cue start="00:00:15.00" end="00:00:17.95" voice="Proog">At the left we can see...</cue>
+     <cue start="00:00:15.00" end="00:00:17.95" class="Proog">At the left we can see...</cue>
-     <cue start="00:00:18.16" end="00:00:20.08" voice="Proog">At the right we can see the...</cue>
+     <cue start="00:00:18.16" end="00:00:20.08" class="Proog">At the right we can see the...</cue>
-     <cue start="00:00:20.11" end="00:00:21.96" voice="Proog">...the head-snarlers&lt;br/>[Whizzing noises]</cue>
+     <cue start="00:00:20.11" end="00:00:21.96" class="Proog">...the head-snarlers&lt;br/>[Whizzing noises]</cue>
-     <cue start="00:00:21.99" end="00:00:24.36" voice="Proog">Everything is safe.&lt;br/>Perfectly safe.</cue>
+     <cue start="00:00:21.99" end="00:00:24.36" class="Proog">Everything is safe.&lt;br/>Perfectly safe.</cue>
    </cuelist>
 </wmml>
@@ Line 284: / Line 296: @@
 <pre>
 <!DOCTYPE wmml>
-<wmml lang="en-US" kind="chapters">
+<wmml lang="en-US" kind="chapters" profile="plainText">
    <cuelist>
      <cue start="00:00:00.00" end="00:00:18.00">Introductory Titles</cue>
@@ Line 302: / Line 314: @@
      <cue start="00:00:18.01" end="00:01:10.00"><img src="plugs.png"/> The Jack Plugs</cue>
      <cue start="00:01:10.01" end="00:02:30.00"><img src="birds.png"/> Robotic Birds</cue>
+  </cuelist>
+</wmml>
+</pre>
+JSON in cues:
+<pre>
+<!DOCTYPE wmml>
+<wmml lang="en-US" kind="chapters" profile="JSON">
+  <cuelist>
+    <cue start="00:00:10.00" end="00:00:20.00">
+         title: "Chapter 2",
+         description: "Some blah relating to chapter 2",
+         image: "/images/chapter2.png"
+    </cue>
+    <cue start="00:00:20.00" end="00:00:30.00">
+         title: "Chapter 3",
+         description: "Chapter 3 blah",
+         image: "/images/chapter3.png"
+    </cue>
    </cuelist>
 </wmml>
@@ Line 310: / Line 343: @@
 <pre>
 <!DOCTYPE wmml>
-<wmml lang="en-US" kind="descriptions">
+<wmml lang="en-US" kind="descriptions" profile="plainText">
    <cuelist>
      <cue start="00:00:00.00" end="00:00:05.00">The orange open movie project presents</cue>
@@ Line 325: / Line 358: @@
 <pre>
 <!DOCTYPE wmml>
-<wmml lang="en-US" kind="metadata">
+<wmml lang="en-US" kind="metadata" profile="JSON">
    <head>
      <title>Really Achieving Your Childhood Dreams</title>
@@ Line 341: / Line 374: @@
 </wmml>
 </pre>
 === Differences to other proposed formats for use in HTML5 ===
-Other formats have been proposed to be used as out-of-the-box supported markup for external time-aligned text documents for HTML5 media elements. The most popular examples are SRT, WebSRT, and TTML (former DFXP).
+Other formats have been proposed to be used as baseline formats for external time-aligned text documents for HTML5 media elements. The most popular examples are SRT, WebSRT, and DFXP/TTML.
 The main difference between SRT and WMML is that WMML is HTML-like and thus requires more markup. But that is offset by the ability to easily extend WMML with existing HTML and CSS features.
-WebSRT tries to extend SRT with features that have been deemed [http://wiki.whatwg.org/wiki/Timed_tracks required for a collection of use cases around captions, subtitles, and karaoke]. While this results in a fairly dense document definition, it also has the drawback that it is not easily extensible to slightly new applications, such as overlays on videos with ads, or captions with images, icons, or hyperlinks in them. Further, WebSRT is not a XML/HTML-based markup and thus requires implementation of a new parsing unit into Web browsers. Such new parsing code should be kept to a minimum, while continuing to provide flexibility of what can be displayed in time-synchronisation with videos.
+WebSRT tries to extend SRT with features that have been deemed [http://wiki.whatwg.org/wiki/Timed_tracks required for a collection of use cases around captions, subtitles, and karaoke]. In its current definition, it is a platform that allows for plain text, minimal markup and random content. Thus, without adding innerHTML support, it has the drawback that it is not natively extensible to new HTML-conformant applications, such as overlays on videos with ads, or captions with images, icons, or hyperlinks in them. Further, WebSRT doesn't really support CSS, but only a small subpart of it, while making up some new functionality, too, in particular for layout and positioning. While not being as complex as XSL-FO, it still has the same drawback for having to implement another layout approach. Further, WebSRT is not a XML/HTML-based markup and thus requires implementation of a new parsing unit into Web browsers.
-TTML has tried to be such a format. It is XML-based and has CSS-like formatting instructions. However, it has diverged too much from HTML/CSS that it is not easily possible to reuse existing HTML & CSS parsing code to interprete a TTML document. At the time of its definition, it seemed like a sensible thing to do in order to stay in sync with XHTML, with XML namespaces and with XSL-FO, but in the modern HTML5 space, these have proven to be a hinderance to implementation in modern Web browsers.
+TTML has tried to be a XML format that supports tradition XHTML approaches. It has CSS-like formatting instructions. However, it is sufficiently different from HTML/CSS that it is not easily possible to reuse existing HTML & CSS parsing code to interpret a TTML document. At the time of its definition, it seemed like a sensible thing to do in order to stay in sync with XHTML, with XML namespaces and with XSL-FO, but in the modern HTML5 space, these have proven to be a hindrance to implementation in modern Web browsers.
-WMML provides a solution to this situation. It is very similar to HTML and reuses CSS for formatting and styling. It tries to be as simple as possible with what it introduces newly. It references HTML and CSS for the bulk of its functionality, which makes it easily extensible, since any new functionality introduced into HTML and CSS is available to WMML, too.
+WMML provides a solution to this situation. It is very similar to HTML and reuses CSS for formatting and styling. It tries to be as simple as possible with what it introduces newly. It references HTML and CSS for the bulk of its functionality, which makes it easily extensible, since any new functionality introduced into HTML and CSS is available to WMML, too. In addition, we've adopted the idea of WebSRT to have other types of content in the cues, too with plain text, JSON and any content available. The @profile attribute will make sure that applications that only want to support one type of content can identify such files.
 Note that WMML is an improvement over a previous experiment with [http://wiki.xiph.org/index.php/Timed_Divs_HTML timed divs]. WMML moves away from re-using existing HTML tags for a different purpose (&lt;body>, &lt;div>) and it introduces a &lt;t> element to allow for Karaoke. That latter is an optional addition.
@@ Line 368: / Line 403: @@
 Web browsers should be able to implement support for WMML fairly easily, since they already have support for most of the required CSS and HTML functionalities.
-For (manual) authoring of WMML document it is expected that authors exert constraint in the actual elements they use. The reason is that the more elements of HTML are being used in WMML documents, the less usable the WMML document becomes to players that do not support Web technologies. Over time, increasing amounts of HTML elements may be supported by authoring tools and stand-alone players, so can be used in typical WMML documents.
+For (manual) authoring of WMML document it is expected that authors exert restraint in the actual elements they use. One reason is that the more features one overlays on video, the less useful the video becomes, so there is usability pressure for the restraint. Also, the more elements of HTML are being used in WMML documents, the less usable the WMML document becomes to players that do not support Web technologies. Over time, increasing amounts of HTML elements may be supported by authoring tools and stand-alone players, so can be used in typical WMML documents.
 Since many new players are already capable of parsing HTML pages, implementation of support for WMML in stand-alone players may not be much of an issue.
-As for the authoring side of WMML documents: for hand-coding, WMML is a bit more verbose than e.g. SRT. It is frequently pointed out that the XML-based caption format USF ([http://en.wikipedia.org/wiki/Universal_Subtitle_Format Universal Subtitle Format]) as it was defined by Matroska developers never achieved any uptake. Reasoning is that the fansubbing community refused to author documents in such a verbose format. However, there was never any support implemented for USF for more than the basic features in any media player, thus the verbose overhead had a big impact and the features were never visible.
+As for the authoring side of WMML documents: for hand-coding, WMML is a bit more verbose than e.g. SRT. It is frequently pointed out that the XML-based caption format USF ([http://en.wikipedia.org/wiki/Universal_Subtitle_Format Universal Subtitle Format]) as it was defined by Matroska developers never achieved any uptake. Reasoning is that the fansubbing community refused to author documents in such a verbose format. However, there was never any support implemented for USF for more than the basic features in any media player or authoring application which probably had a lot more to do with the lack up uptake than the verbosity of the format.
 The situation with WMML is different though, since it's not built completely new from scratch. If all Web browsers support WMML and its advanced features, then authors understand the usefulness of the verbosity. Also, because WMML would reuse HTML parsers, all features would be available immediately in a Web browser without having to wait for player developers to catch up. Exporting to WMML from a subtitling or captioning creation application also wouldn't be hard, at least for the most fundamental needs - and it would provide for all the features of advanced formats, too. Finally, stand-alone players that consider implementation of support for WMML will look at it in the context of also implementing support for HTML documents - something increasingly useful to media players (as exemplified in iTunes etc). Thus, there is no additional overhead (or only minimal overhead) in implementing WMML.
 Ultimately, the aim of a new Web caption format should be to enable new people to author captions. By creating a format that is so similar to HTML that it is trivial to author in for any Web developer, we can suddenly recruit all the Web developers of the world as captioners. This is a much more important aim than the in relation easy challenge of convincing existing captioners to export their files into another new file format.

Accessibility/Video Text Format: Difference between revisions

Accessibility/Video Text Format (view source)

Revision as of 07:01, 5 August 2010

Navigation menu

Search