Confirmed users
33
edits
No edit summary  | 
				|||
| Line 26: | Line 26: | ||
On the send side, to minimize latency we really should change the input API to be pull-based. The current code for Windows actually pulls from Win32 every 4ms on a dedicated thread. If we let the MediaStreamGraph do the pull instead, we can reduce latency to T_graph and T_proc. As above we can minimize T_proc for streams that aren't involved in other processing, so we can get a latency of T_graph + T_proc = 11ms on the receive side. To do better, we'd have to lower T_graph or identify cases where we can provide a "fast path" that gives the network stack consumer access to samples as soon as they're queued for the SourceMediaStream. However, I think we should focus on trying to achieve the goals already laid out here.  | On the send side, to minimize latency we really should change the input API to be pull-based. The current code for Windows actually pulls from Win32 every 4ms on a dedicated thread. If we let the MediaStreamGraph do the pull instead, we can reduce latency to T_graph and T_proc. As above we can minimize T_proc for streams that aren't involved in other processing, so we can get a latency of T_graph + T_proc = 11ms on the receive side. To do better, we'd have to lower T_graph or identify cases where we can provide a "fast path" that gives the network stack consumer access to samples as soon as they're queued for the SourceMediaStream. However, I think we should focus on trying to achieve the goals already laid out here.  | ||
== Latency targets per platform ==  | |||
=== Linux ===  | |||
We have two backends in cubeb:  | |||
;ALSA: Pretty rudimentary. Most distros are not using ALSA directly anymore. However, some people choose to do so  | |||
;PulseAudio: Advertised as good as ALSA (I could confirm that on my setup, but read below). This gives us automatic latency adjustments in case of underrun. We can get actual latency (as opposed to the requested latency), so we can react.  | |||
Both have been tested on a thinkpad w530 and on a Thinkpad t420, the latency is great (sub 10ms). I just got a 300 euros netbook I can test with, but I need a 32bits build to check. I am sure that minimal latency relies heavily on good interaction between:  | |||
- The soundcard kernel driver  | |||
- ALSA  | |||
- Pulse  | |||
and I believe my setup happens to have a good driver (plus fast CPU and such), and this will be completely different with different hardware and software environment.  | |||
I know that some Pulse releases in the while are buggy and make the latency grow, we should make sure to be able to detect that.  | |||
Other browsers achieve 40ms, we want to be able to be at least as good.  | |||
=== Windows ===  | |||
We have only one backend in cubeb, that uses winmm, which is not intended at all to be low-latency. We need a new backend, that would be using the WASAPI, so we can achieve good performances.  | |||
WASAPI is in the 30ms range, and can go lower if:  | |||
- we are running on decent audio hardware  | |||
- we use the windows API to increase the audio thread priority  | |||
It is possible to go lower if we request exclusive access to the hardware, which is something WASAPI exposes in its API.  | |||
On my Thinkpad t420, I could bring the latency down to 512 frames at 48kHz (around 10ms) (using a DAW that has a WASAPI backend). I tried to increase the CPU load, and I could make it underrun a bit, but nothing terrible (it would be unnoticed when doing a WebRTC call). At 1024 frames, I could not detect underrun by ear.  | |||
WASAPI is available on Windows Vista, Windows 7, Windows 8. People using a stock Windows XP with a normal sound card won't get great latencies. The only way around it is to a super low level API that takes exclusive access of the hardware (basically bypassing the system's mixer and talking directly to the kernel).  | |||
=== MacOS ===  | |||
No problems at all, we can bring the latency super low (12.5ms is what we do at the moment, lower is doable). No glitches noticeable, latency is stable and everything works fine under high CPU load without underruns.  | |||
=== Android ===  | |||
For Android < 4.1 won't be able to do anything great, see <http://code.google.com/p/android/issues/detail?id=3434>. 100+ms are expected, nothing we can do to lower that.  | |||
For Android > 4.1, on some devices (currently, Galaxy Nexus, Nexus 4, Nexus 10, that is, high-end devices where specs are controlled by Google), you can achieve around 8ms, using what is called FastMixer. Basically, you do a trade of between using resampling (that is, you are bound to use the hardware's preffered samplerate), putting effects, and other goodies. You also have to use a certain buffersize. We don't care about that, because we just want a PCM interface, so it is all great.  | |||
Good thing is, we can detect at runtime if the device can run at low latency, so we can do it right now.  | |||
=== B2G ===  | |||
No idea.  | |||
== Plan of action ==  | |||
TODO  | |||
== Bugs to file ==  | |||
TODO  | |||