Video/Adaptive Bandwidth

From MozillaWiki
Jump to navigation Jump to search

Overview

This page attempts to capture the use cases and possible options for supporting adaptive bandwidth with web video and Firefox.

Stakeholders

  • Video producers
  • Hosting companies
  • Video platform providers
  • Browser vendors
  • Codec producers

Use cases

People want the ability to have the client automatically pick the right bandwidth stream to play a single video asset. Current systems range from Apple's live-over-HTTP system to Microsoft's smooth stream to something that's not HTTP-based like what's included in Flash and other proprietary video delivery systems.

Lots of people want the ability to only stream as fast as the video requires and not allow infinite buffering when the player is paused. (We've heard this from multiple parties.) This is about saving money - sometimes people only watch a little content but buffering an entire video can be expensive.

People are also asking to use this as a poor-mans system to obfuscate easy downloading of content, or at least making sure it takes as long to download as the actual content is. Many of the streaming options listed below are also driven by this.

Decisions about what stream to use can be based on a number of different factors:

  • What the device supports (WebM vs Theora vs. H.264)
  • What the device can actually decode (Mobile vs. Desktop)
  • How fast the device is decoding (i.e. mobile and low-end desktops can't render HD H.264 at High profiles.)
  • Buffering depth (reported available bandwidth might have nothing to do with what's possible because of latency or proxy constraints)
  • Other issues (provisioning or pay-for-HD content providers)
  • The actual person might want to downgrade manually or upgrade and buffer

Requirements

  • Easy to deploy on existing HTTP servers without custom software
  • Allows people who want to build custom software to do so for big sites / service providers
  • Works in the range of current containers + codecs
  • Allows the client to make stream decision based on CPU usage, bandwidth
  • Client can stop downloading when the video is paused
  • Client is smart about downloading only as much as it needs
  • Client can optionally buffer way out ahead of the current playback position for people who want to allow buffering for unreliable connections.

Current systems

Apple's live streaming over http

Found in IETF draft form this is a system that uses a manifest file that describes bandwidth and a set of chopped up MPEG files that the video client can jump between depending on bandwidth requirements. Each chunk of the video must be aligned on boundaries and the video client knows enough to stitch together different chunks of video into what looks like a single video experience.

Nice thing about this system is that it doesn't require a smart server and you can reload the manifest file from time to time to get "new" clips. That is, you can actually build a live stream from this system.

At least to some degree this is working around the MPEG file format as well since that format doesn't support the live use case. (See Monty's post about the Ogg format.)

Microsoft smooth streaming

Part of Silverlight. Need to explore the details.

Experience IIS Smooth Streaming: http://www.iis.net/media/experiencesmoothstreaming

Ogg proposal

This was an idea put together by some of the Mozilla people while working on the Ogg index format. Basically in the index you keep file offsets not only to keyframes in the current file, but offsets into files that have the same content but are built at a different bandwidth. If the index is at the front of the file this means you can quickly jump from one file to another.

Upside is that it doesn't require a separate manifest file and switching between streams is very quick based on byte offset.

Downside is it's not clear how this would work in the live stream case.

Adobe

Need to investgate, probably runs over their non-HTTP system.

Servers

Another option is to have servers that can do this for you. This should be possible with Ogg Theora with chained streams. Basically just switch out the stream with a lower-bandwidth version and if the client supported this it would be possible to seamlessly switch from one bandwidth to another with a pretty simple client implementation.

Matroska might support this as well.

Codec and container limitations and opportunities

  • VP8 requires no setup headers
  • VP8 can switch resolutions in the middle of a bitstream
  • Theora requires setup headers
  • Theora uses the same setup headers as long as the resolution + framerate are the same.
  • Vorbis requires setup headers
  • Vorbis uses different setup headers for different bitrates
  • Matroska requires that video and audio packets align with a specific timestamp (?)
  • Ogg does not require that video and audio packets align, up to decoder to figure it out.

Theora thoughts from Tim:

You can also [send variable bitrates] on the server side with Theora so long as all of the versions have the same resolution and framerate, merely by allowing stream swaps at every keyframe. This avoids re-transmitting headers. libtheora currently uses the same headers for all bitrates and in 2-pass mode will always place keyframes at the same places if you encode using the same 1st pass stats.

You could do a similar thing for live streaming, but that requires a new software interface to get the keyframes in the same place in each stream.

Vorbis thoughts from Matthew:

I think it's reasonable to use a single audio bitrate. Apple's FAQ for HTTP Live Streaming says "for seamless transitions between alternate streams, the audio portion of the stream should be identical in all versions". I'm fairly sure the guidelines for producing streams for use with Smooth Streaming say something similar.

Then again, if we need to support multiple audio tracks for another reasons (e.g. multiple languages) we'll need to implement the same code we'd need to support switching bitrates on audio streams.

Candidate Approaches

Implement DASH Subset

One option is to implement a subset of the DASH manifest spec directly in the browser, with the browser implementing support for everything --- caching, seeking, stitching, adaptation decisions, etc. roc: Netflix told us what subset they wanted, but I can't remember what it was anymore. Maybe Josh remembers.

This is not as flexible as the other approaches and may require more implementation effort for the browser, but it's probably the easiest way for individual authors to get adaptive content on the Web.

Scripted Implementation Using ProcessedMediaStream

With ProcessedMediaStream (which is being implemented for many use-cases as well as this one), Web authors can schedule clips to be played one after another with seamless switching from clip to clip. Each clip would be an individual media element. The author can choose to have each clip refer to a different downloaded resource, or have them refer to different seek offsets in a single resource, or some combination. Audio and video can be obtained through different elements, or the same element, while preserving A/V sync.

This would put most of the implementation burden on client-side script, but would give script a great deal of control over the decision-making. We'd probably need to expose additional statistics APIs on media elements to support adaptation decisions.

Scripted Implementation Using appendBytes()

Aaron Cowell at Google is working on an API that allows scripts to feed compressed data streams into the decoder, giving script control over the transport and caching. There are some problems with this approach, e.g. it requires disabling (or knowledge of) the browser's internal caching. It also requires knowledge of the behavior of the browser's demuxer during seeking.

Implementation Strategy

...