Media/WebRTC Audio Issues
Audio issues in getUserMedia and WebRTC: (bug numbers and details to be added)
- 1 Sampling Rate issues
- 2 Audio Latency -- bug 785584
- 3 Need to add a TrackUnion to streams output from PeerConnection
- 4 AEC location & quality (bug 916331)
- 5 Dynamic input/output changes
- 6 Testing Audio
Sampling Rate issues
bug 886886 -- This causes a 0.23% drift in audio and a buffer buildup in MediaStreamGraph (delay). This is only an issue when a sampling frequency of 44100 is used, in particular on Windows when the user/default has that selected. Note some laptops and other devices come configured this way, or may have been reconfigured by the user for 44100Hz
- We believe this issue is also affecting B2G
- Since we're requesting 16000 currently (and perhaps/probably 32000 or 48000 in the future), it makes sense to do the resample at the 44/44.1->16000 point, not a second resample. Bug 886886 has a patch to handle this using the Speex resampler in our tree. There are also patches to port upcoming work from webrtc.org using their sinc resampler; however those are far more extensive and not upliftable.
Long-term clock-rate mismatches and drift
bug 884365 -- Because input, MediaStreamGraph (MSG) and output clocks may all be mismatched and/or change slightly over time, the code has to handle controlling delay and possible underflow. This normally isn't a problem when fed to a PeerConnection, as the other side will compensate, but for getUserMedia uses we have to care.
- Inputs need to either sync to the MSG frequency (system now, to be output clock), or they need to have optional resampling (see above about long-term mismatches)
- If the 44/44.1 issue is dealt with, then likely any adjustments to the resampling ratio for handling this (mismatch/buffering/drift) would be small enough and slow enough to likely not need to pass raw data (though perhaps there's a tiny quality loss for the far-end listener).
- Google tells us they have not yet dealt with this issue in Chrome.
Extra clock domains in MediaStreamGraph
MediaStreamGraph currently is clocked on the system clock; ongoing work is moving it to be clocked on the audio output clock. This will reduce total delay in MSG. Note that the output clock frequency may both drift and also suddenly change when the output is re-routed, and the code needs to adapt smoothly to this.
Basic sample rate low
Right now, we're clocking everything at 16000Hz; we should be using higher clockrates.
- The "L16" pseudo-codec in GIPS only supports 8000/16000/32000 sampling rates (including in 3.30)
Audio Latency -- bug 785584
Increases in delay and loss of sync -- bug 879213
- Clock-domain mismatches need a resampler to avoid possible latency buildup --
- Underflows in MSG cause MSG to "slip" the stream, such that later data is permanently delayed. bug 901831, bug 901539. A resampler can treat just slips as clock jitter -- bug 908834, though if the underflow is serious enough we may need to simply drop audio.
- Reducing load on MSG in callbacks (NotifyPush(), NotifyQueuedTrackChanges()) will reduce the odds of MSG underflowing. See
bug 884365for reduction in the largest CPU consumer (Opus Encode + AEC). Also see bug 901831 for an odd windows-only two-browser interaction.
MediaStreamGraph fundamental latency and backend output latency (see Gecko:MediaStreamLatency)
- Because MediaStreamGraph reclocks and plays out from the MSG, it has to keep a minimal buffering level to avoid underflow. This adds 15-30ish ms of input-side latency (and output latency, though output clocking MSG will reduce that). Note that correcting this may require different audio streams for internal versus PeerConnection/"realtime" streams. See patch on
- Investigate any remaining latency issues
Need to add a TrackUnion to streams output from PeerConnection
We can get persistent delay if the output of a PeerConnection gets blocked. The patch for this has been r-'d and needs a re-design. bug 832881
AEC location & quality (bug 916331)
The AEC should be in getUserMedia() to have the option of cancelling audio from multiple PeerConnections (so A doesn't head the echo of B (and vice-versa) when both are talking to C). Also this will allow other audio from the browser to be cancelled. Currently the AEC only cancels audio in the same PeerConnection. bug 694814
- Google apparently has not moved the AEC yet either.
- Recent cubeb changes (in FF 28) reduced output delay and broke the AEC; bug 974537 makes the expected delays lower and platform-specific
- See WebRTC AEC Tuning for how to adjust these if you still have a problem and report it
Dynamic input/output changes
We need to support hot-(un)plugging headsets, and preferably not requiring it to be unplugged to send audio to speakers/etc. bug 827146
We need to support audio output routing (at least to support "ringing" from main speakers while in-call/video/etc audio goes to headset).
- Interactive data - patches to log latency and extract it to a graph or graph it live - bug xxxxxx
- Automated tests - Find some way to test audio quality and delay in automated testing - bug xxxxxx