Media/WebRTC Audio Issues
< Media
Audio issues in getUserMedia and WebRTC: (bug numbers and details to be added)
- Sampling Rate issues
- 44/44.1KHz mismatch -- bug 886886
This causes a 0.23% drift in audio and a buffer buildup in MediaStreamGraph (delay). This is only an issue when a sampling frequency of 44100 is used, in particular on Windows when the user/default has that selected. Note some laptops and other devices come configured this way, or may have been reconfigured by the user for 44100Hz- We believe this issue is also affecting B2G
- Since we're requesting 16000 currently (and perhaps/probably 32000 or 48000 in the future), it makes sense to do the resample at the 44/44.1->16000 point, not a second resample. Bug 886886 has a patch to handle this using the Speex resampler in our tree. There are also patches to port upcoming work from webrtc.org using their sinc resampler; however those are far more extensive and not upliftable.
- Long-term clock-rate mismatches and drift -- bug 884365
Because input, MediaStreamGraph (MSG) and output clocks may all be mismatched and/or change slightly over time, the code has to handle controlling delay and possible underflow. This normally isn't a problem when fed to a PeerConnection, as the other side will compensate, but for getUserMedia uses we have to care.- Inputs need to either sync to the MSG frequency (system now, to be output clock), or they need to have optional resampling (see above about long-term mismatches)
- If the 44/44.1 issue is dealt with, then likely any adjustments to the resampling ratio for handling this (mismatch/buffering/drift) would be small enough and slow enough to likely not need to pass raw data (though perhaps there's a tiny quality loss for the far-end listener).
- Google tells us they have not yet dealt with this issue in Chrome.
- Extra clock domains in MediaStreamGraph
MediaStreamGraph currently is clocked on the system clock; ongoing work is moving it to be clocked on the audio output clock. This will reduce total delay in MSG. Note that the output clock frequency may both drift and also suddenly change when the output is re-routed, and the code needs to adapt smoothly to this. - Basic sample rate low
Right now, we're clocking everything at 16000Hz; we should be using higher clockrates.- The "L16" pseudo-codec in GIPS only supports 8000/16000/32000 sampling rates (including in 3.30)
- 44/44.1KHz mismatch -- bug 886886
- MediaStreamGraph fundamental latency
- Because MediaStreamGraph reclocks and plays out from the MSG, it has to keep a minimal buffering level to avoid underflow. This adds 15-30ish ms of input-side latency (and output latency, though output clocking MSG will reduce that). Note that correcting this may require different audio streams for internal versus PeerConnection/"realtime" streams.
- Need to add a TrackUnion to streams output from PeerConnection
- We can get persistent delay if the output of a PeerConnection gets blocked. The patch for this has been r-'d and needs a re-design.
- AEC location & quality
- The AEC should be in getUserMedia() to have the option of cancelling audio from multiple PeerConnections (so A doesn't head the echo of B (and vice-versa) when both are talking to C). Also this will allow other audio from the browser to be cancelled. Currently the AEC only cancels audio in the same PeerConnection.
- Google apparently has not moved the AEC yet either.
- The pre-AEC resampler could be higher quality (currently it's linear)
- Google tells us the pre-AEC resampler quality doesn't matter much as it works only on the far-end sound, and in practice causes little cancellation-quality loss.
- The AEC should be in getUserMedia() to have the option of cancelling audio from multiple PeerConnections (so A doesn't head the echo of B (and vice-versa) when both are talking to C). Also this will allow other audio from the browser to be cancelled. Currently the AEC only cancels audio in the same PeerConnection.
- Dynamic input/output changes
- We need to support hot-(un)plugging headsets, and preferably not requiring it to be unplugged to send audio to speakers/etc
- We need to support audio output routing (at least to support "ringing" from main speakers while in-call/video/etc audio goes to headset).
- Investigate any remaining latency issues
- Identifying any additional issues ASAP is critical
- Find some way to test audio quality and delay in automated testing