Media/WebRTC Audio Issues: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 1: Line 1:
Audio issues in getUserMedia and WebRTC: (bug numbers and details to be added)
Audio issues in getUserMedia and WebRTC: (bug numbers and details to be added)


* '''Sampling Rate issues'''
== '''Sampling Rate issues''' ==
** 44/44.1KHz mismatch -- <strike>{{bug|886886}}</strike><br>This causes a 0.23% drift in audio and a buffer buildup in MediaStreamGraph (delay).  This is only an issue when a sampling frequency of 44100 is used, in particular on Windows when the user/default has that selected.  Note some laptops and other devices come configured this way, or may have been reconfigured by the user for 44100Hz
=== 44/44.1KHz mismatch ===
*** We believe this issue is also affecting B2G
<strike>{{bug|886886}}</strike> -- This causes a 0.23% drift in audio and a buffer buildup in MediaStreamGraph (delay).  This is only an issue when a sampling frequency of 44100 is used, in particular on Windows when the user/default has that selected.  Note some laptops and other devices come configured this way, or may have been reconfigured by the user for 44100Hz
*** Since we're requesting 16000 currently (and perhaps/probably 32000 or 48000 in the future), it makes sense to do the resample at the 44/44.1->16000 point, not a second resample.  Bug 886886 has a patch to handle this using the Speex resampler in our tree.  There are also patches to port upcoming work from webrtc.org using their sinc resampler; however those are far more extensive and not upliftable.
* We believe this issue is also affecting B2G
** Long-term clock-rate mismatches and drift -- {{bug|884365}}<br>Because input, MediaStreamGraph (MSG) and output clocks may all be mismatched and/or change slightly over time, the code has to handle controlling delay and possible underflow.  This normally isn't a problem when fed to a PeerConnection, as the other side will compensate, but for getUserMedia uses we have to care.
* Since we're requesting 16000 currently (and perhaps/probably 32000 or 48000 in the future), it makes sense to do the resample at the 44/44.1->16000 point, not a second resample.  Bug 886886 has a patch to handle this using the Speex resampler in our tree.  There are also patches to port upcoming work from webrtc.org using their sinc resampler; however those are far more extensive and not upliftable.
*** Inputs need to either sync to the MSG frequency (system now, to be output clock), or they need to have optional resampling (see above about long-term mismatches)
=== Long-term clock-rate mismatches and drift ===
*** If the 44/44.1 issue is dealt with, then likely any adjustments to the resampling ratio for handling this (mismatch/buffering/drift) would be small enough and slow enough to likely not need to pass raw data (though perhaps there's a tiny quality loss for the far-end listener).
{{bug|884365}} -- Because input, MediaStreamGraph (MSG) and output clocks may all be mismatched and/or change slightly over time, the code has to handle controlling delay and possible underflow.  This normally isn't a problem when fed to a PeerConnection, as the other side will compensate, but for getUserMedia uses we have to care.
*** Google tells us they have not yet dealt with this issue in Chrome.
* Inputs need to either sync to the MSG frequency (system now, to be output clock), or they need to have optional resampling (see above about long-term mismatches)
** Extra clock domains in MediaStreamGraph<br>MediaStreamGraph currently is clocked on the system clock; ongoing work is moving it to be clocked on the audio output clock.  This will reduce total delay in MSG.  Note that the output clock frequency may both drift and also suddenly change when the output is re-routed, and the code needs to adapt smoothly to this.
* If the 44/44.1 issue is dealt with, then likely any adjustments to the resampling ratio for handling this (mismatch/buffering/drift) would be small enough and slow enough to likely not need to pass raw data (though perhaps there's a tiny quality loss for the far-end listener).
** Basic sample rate low<br>Right now, we're clocking everything at 16000Hz; we should be using higher clockrates.
* Google tells us they have not yet dealt with this issue in Chrome.
*** The "L16" pseudo-codec in GIPS only supports 8000/16000/32000 sampling rates (including in 3.30)
=== Extra clock domains in MediaStreamGraph ===
* '''Audio Latency''' -- {{Bug|785584}}
MediaStreamGraph currently is clocked on the system clock; ongoing work is moving it to be clocked on the audio output clock.  This will reduce total delay in MSG.  Note that the output clock frequency may both drift and also suddenly change when the output is re-routed, and the code needs to adapt smoothly to this.
** Increases in delay and loss of sync -- {{bug|879213}}
=== Basic sample rate low ===
*** Clock-domain mismatches need a resampler to avoid possible latency buildup -- {{bug|884365}}
Right now, we're clocking everything at 16000Hz; we should be using higher clockrates.
*** Underflows in MSG cause MSG to "slip" the stream, such that later data is permanently delayed. {{bug|901831}}, {{bug|901539}}.  A resampler can treat just slips as clock jitter -- {{bug|884365}}, though if the underflow is serious enough we may need to simply drop audio.
* The "L16" pseudo-codec in GIPS only supports 8000/16000/32000 sampling rates (including in 3.30)
*** Reducing load on MSG in callbacks (NotifyPush(), NotifyQueuedTrackChanges()) will reduce the odds of MSG underflowing.  See {{bug|884365}} for reduction in the largest CPU consumer (Opus Encode + AEC).  Also see {{bug|901831}} for an odd windows-only two-browser interaction.
== '''Audio Latency''' -- {{Bug|785584}} ==
** MediaStreamGraph fundamental latency and backend output latency (see [[Gecko:MediaStreamLatency]])
=== Increases in delay and loss of sync -- {{bug|879213}} ===
*** Because MediaStreamGraph reclocks and plays out from the MSG, it has to keep a minimal buffering level to avoid underflow.  This adds 15-30ish ms of input-side latency (and output latency, though output clocking MSG will reduce that).  Note that correcting this may require different audio streams for internal versus PeerConnection/"realtime" streams.  See patch on {{bug|884365}}
* Clock-domain mismatches need a resampler to avoid possible latency buildup -- {{bug|884365}}
** Investigate any remaining latency issues
* Underflows in MSG cause MSG to "slip" the stream, such that later data is permanently delayed. {{bug|901831}}, {{bug|901539}}.  A resampler can treat just slips as clock jitter -- {{bug|884365}}, though if the underflow is serious enough we may need to simply drop audio.
* '''Need to add a TrackUnion to streams output from PeerConnection'''
* Reducing load on MSG in callbacks (NotifyPush(), NotifyQueuedTrackChanges()) will reduce the odds of MSG underflowing.  See {{bug|884365}} for reduction in the largest CPU consumer (Opus Encode + AEC).  Also see {{bug|901831}} for an odd windows-only two-browser interaction.
** We can get persistent delay if the output of a PeerConnection gets blocked.  The patch for this has been r-'d and needs a re-design.  {{Bug|832881}}
=== MediaStreamGraph fundamental latency and backend output latency (see [[Gecko:MediaStreamLatency]]) ===
* '''AEC location & quality'''
* Because MediaStreamGraph reclocks and plays out from the MSG, it has to keep a minimal buffering level to avoid underflow.  This adds 15-30ish ms of input-side latency (and output latency, though output clocking MSG will reduce that).  Note that correcting this may require different audio streams for internal versus PeerConnection/"realtime" streams.  See patch on {{bug|884365}}
** The AEC should be in getUserMedia() to have the option of cancelling audio from multiple PeerConnections (so A doesn't head the echo of B (and vice-versa) when both are talking to C). Also this will allow other audio from the browser to be cancelled.  Currently the AEC only cancels audio in the same PeerConnection.  {{Bug|694814}}
* Investigate any remaining latency issues
*** Google apparently has not moved the AEC yet either.
== '''Need to add a TrackUnion to streams output from PeerConnection''' ==
** The pre-AEC resampler could be higher quality (currently it's linear)
We can get persistent delay if the output of a PeerConnection gets blocked.  The patch for this has been r-'d and needs a re-design.  {{Bug|832881}}
*** Google tells us the pre-AEC resampler quality doesn't matter much as it works only on the far-end sound, and in practice causes little cancellation-quality loss.
== '''AEC location & quality''' ==
* '''Dynamic input/output changes'''
The AEC should be in getUserMedia() to have the option of cancelling audio from multiple PeerConnections (so A doesn't head the echo of B (and vice-versa) when both are talking to C). Also this will allow other audio from the browser to be cancelled.  Currently the AEC only cancels audio in the same PeerConnection.  {{Bug|694814}}
** We need to support hot-(un)plugging headsets, and preferably not requiring it to be unplugged to send audio to speakers/etc.  {{Bug|827146}}
* Google apparently has not moved the AEC yet either.
** We need to support audio output routing (at least to support "ringing" from main speakers while in-call/video/etc audio goes to headset).
== '''Dynamic input/output changes''' ==
* '''Find some way to test audio quality and delay in automated testing'''
=== Hot-plug ===
We need to support hot-(un)plugging headsets, and preferably not requiring it to be unplugged to send audio to speakers/etc.  {{Bug|827146}}
=== Audio routing ===
We need to support audio output routing (at least to support "ringing" from main speakers while in-call/video/etc audio goes to headset).
== Testing Audio ==
Find some way to test audio quality and delay in automated testing
Confirmed users
325

edits

Navigation menu