Gecko:MediaStreamLatency: Difference between revisions

Revision as of 02:03, 14 November 2012

Minimizing the end-to-end latency of WebRTC is very important. Minimizing the latency of Web Audio, for example in the case where a script wants to start playing an AudioBuffer immediately, is also very important. What do we need to do with MediaStreamGraph to support this?

On this page I'm trying to work out estimates of the current latency of our MediaStreams setup, and how we can improve the situation.

Current State

The MediaStreamGraph runs every 10ms (modulo scheduling issues); let's call this T_graph. Let's assume the graph processing time is T_proc per tick --- hopefully just a few ms, so let's say 3ms. On my laptop the cubeb callback in nsBufferedAudioStream runs every 25ms, let's call this T_cubeb.

On the WebRTC "send" side, we have a microphone producing callbacks periodically (T_mike, 4ms in current code, I think). (I'll ignore driver processing delay, which we can't control.) That data is immediately appended to the input queue of a SourceMediaStream. At the next MediaStreamGraph tick, that data is moved to the output side of the SourceMediaStream or, probably soon, a wrapping TrackUnionStream. At that moment it can be observed by a MediaStreamListener. Let's define the overall send latency T_send as the longest delay between a sample being recorded and reaching the network stack via a MediaStreamListener. Assuming pessimal timing of the MediaStreamGraph tick, that's (for the first sample in a chunk)

 T_send = T_mike + T_graph + T_proc

In my example, this is 4 + 10 + 3 = 17ms. The ideal latency would be just T_mike = 4ms.

On the "receive" side, at each MediaStreamGraph tick we pull the latest data from the network stack (via NotifyPull), and copy it to the end of the output nsAudioStream. The worst-case latency before data is picked up by the libcubeb callback (ignoring scheduling problems) happens when a sample has to wait T_graph for the graph to tick, then wait for graph processing, and then we append the sample to the nsAudioStream buffer and have to wait for at most T_cubeb for it to be consumed and played. (I'll ignore driver processing time, which we can't control.)

 T_recv = T_graph + T_proc + T_cubeb

In my example, this is 38ms. The ideal latency would be to pull the latest audio from the network stack in each libcubeb callback, i.e. T_cubeb = 25ms.

Playing an AudioBuffer has the same latency analysis.

Desirable State

I propose running MediaStreamGraph processing every 10ms normally but *also* running it during the libcubeb callback to "catch up" to produce the rest of the data needed when the libcubeb callback fires. On Windows 7, putting a sleep(10ms) in the libcubeb callback doesn't seem to hurt. On the receive side, this will mean that the data returned to the libcubeb callback will always include the latest audio available from the network stack, reducing the latency to T_cubeb + T_proc = 28ms.

In fact we can do a little better by moving our NotifyPull calls to run as late as possible during MediaStreamGraph processing. If the network audio is not being processed, the NotifyPull(s) can happen after any WebAudio processing has been done. This effectively reduces T_proc to almost nothing. Let's call it 1ms, so we've reduced our latency to 26ms, almost the minimum allowed by libcubeb.

Script playing AudioBuffers won't be so amenable to that optimization since we need to process all messages from the DOM, so we'll take the full T_proc and have a latency of 28ms.

On the send side, to minimize latency we really should change the input API to be pull-based. The current code for Windows actually pulls from Win32 every 4ms on a dedicated thread. If we let the MediaStreamGraph do the pull instead, we can reduce latency to T_graph and T_proc. As above we can minimize T_proc for streams that aren't involved in other processing, so we can get a latency of T_graph + T_proc = 11ms on the receive side. To do better, we'd have to lower T_graph or identify cases where we can provide a "fast path" that gives the network stack consumer access to samples as soon as they're queued for the SourceMediaStream. However, I think we should focus on trying to achieve the goals already laid out here.

@@ Line 19: / Line 19: @@
 == Desirable State ==
-I propose running MediaStreamGraph processing every 10ms normally but *also* running it during the libcubeb callback to "catch up" to proudce the rest of the data needed when the libcubeb callback fires. On Windows 7, putting a sleep(10ms) in the libcubeb callback doesn't seem to hurt. On the receive side, this will mean that the data returned to the libcubeb callback will always include the latest audio available from the network stack, reducing the latency to T_cubeb + T_proc = 28ms.
+I propose running MediaStreamGraph processing every 10ms normally but *also* running it during the libcubeb callback to "catch up" to produce the rest of the data needed when the libcubeb callback fires. On Windows 7, putting a sleep(10ms) in the libcubeb callback doesn't seem to hurt. On the receive side, this will mean that the data returned to the libcubeb callback will always include the latest audio available from the network stack, reducing the latency to T_cubeb + T_proc = 28ms.
 In fact we can do a little better by moving our NotifyPull calls to run as late as possible during MediaStreamGraph processing. If the network audio is not being processed, the NotifyPull(s) can happen after any WebAudio processing has been done. This effectively reduces T_proc to almost nothing. Let's call it 1ms, so we've reduced our latency to 26ms, almost the minimum allowed by libcubeb.

Gecko:MediaStreamLatency: Difference between revisions

Revision as of 02:03, 14 November 2012

Current State

Desirable State

Navigation menu

Search