Media/WebAudio
- 2015 Audio {output,input}/MSG roadmap
In no particular order: some tasks are blocked by other things (work from other
teams, spec, etc.), some we can start doing now.
Some comments about the expected difficulty and whether it's blocked is at the
end of each point in [brackets].
- Web Audio API suspend/resume
Gaia is hacking around the lack of this API for now, but we need it, there has
been around 3-4 high-profile battery consumption regression because of that.
The spec is now finished, so we are good to implement.
[not too hard]
- Possible audio output device switch fixes for Windows
We need to add some code to handle audio output device switching in
windows/WASAPI. (This might be done by the end of 2014). This is driver/windows
version dependent, so some testing is needed beforehand.
[not too hard]
- Web Audio API Audio Worker
This is _very_ important to implement quickly and right when the spec is done.
I've got famous people ready to do crazy demos for us when we have it ready.
[hard, blocked on spec]
- Audio input and output code consolidation
These days, our audio input and our audio output code are two completely
different code bases: the audio input in buried down in webrtc-land, and the
audio output code is in `libcubeb`.
This cause a number of issues (ranked in least problematic to most important):
- We need to carry patches on top of the `webrtc.org` codebase to make it fit our
need
- We don't know the code base well (certainly not as well as something we'd have
written)
- It's a bit harder (in terms of plumbing) to use platform AECs (we know OSX's
is not great, but it could become better)
- It's hard to recover from over/under-run, and communicate clear and up-to-date
timing and latency figures to the AEC.
- There are issues on certain platforms (osx) where when you don't have
full-duplex input and outputs and change output device, the output callback
does not get called and then you get a loop-back latency buildup (this is
important for gUM + Web Audio, and to make sure the AEC still works optimally
without having a massive buffer). This same problem might exist on other
platforms, notably certain combination of windows hardware + version
- We can't do full-duplex audio streams (input + output in the same callback),
so:
- We miss latency and performance optimizations:
- For loopback: gUM -> Web Audio API -> speakers, which is becoming rather
common
- In general, because full-duplex means only one IPC round-trip between the
audio client (Firefox) and the server (pulse/wasapi/etc.).
- We can't have perfect clock correlation between input and output, which is
very problematic for the AEC (it works now, it could be much better)
This is quite some work, but will have great benefits. It would require writing
the input side of `libcubeb`, and ditching the webrtc `audio_device` module. I'd
rather do that while WebRTC is not too mainstream because it's big and dangerous,
but this looks more difficult in terms of timeline every day, the window is
closing fast. There is no other way and we will have to do it at some point,
though.
[quite hard and sizeable]
- Monitoring stream feedback for AEC
Some platforms (PulseAudio, Windows) allow a process to access the output of the
system mixer. Feeding that back to the AEC would be obviously better that using
the output of the MSG mixer.
[not too hard]
- Audio devices selection
The W3C is in the process of doing an API to let the user choose what will be
the device for playback and recording on the various API on the Web platform
(`HTMLMediaElement`, Web Audio API, gUM).
While we have some code for the audio input device enumeration, we have no code
whatsoever for the output side (we always imply that we want to use the default
device). Also it would be good to have pairing feature for input and output
devices (e.g. matching the headset mic with the headset headphones).
This needs more spec work before we can expose anything to authors, but is a lot
of platform-specific and plumbing work.
[medium, need platform specific work, can be parallelized, somewhat blocked on
spec]
- Multiple MSG per process
(Related to the previous point about audio device selection)
If we let authors choose the audio output device, and because the MSG is driven
by the audio callbacks, we will need to write code to make sure MSG can
communicate with each other (because you can connect multiple AudioContext
together using MediaStreams).
This will also be necessary to implement the upcoming "deep-buffer" option that is
arriving to Web Audio API, and that will be very useful to save battery.
[medium]
- PulseAudio on Firefox OS
We should really look (again) into it. I know `mwu` has a proof-of-concept, I also
know one of the maintainers of PulseAudio, Arun Raghavan (FordPrefect on irc)
(who conveniently idles in ##media and went as far as fixing Firefox bugs for us)
has tried it with great success:
- <http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/>
(audioflinger is the internal name of the Android audio stack)
- <http://arunraghavan.net/2012/04/pulseaudio-on-android-part-2/>
As you see, CPU usage, power usage, latency are _way_ better using Pulse (at the
expense of 400kB of memory on the client, while saving 3.7MB on the server). We
should redo those measurements to make sure.
Historically, PulseAudio has been used on Nokia phones successfully in the past.
Downsides/possible issues: This would require talking to Qualcomm and getting
them to certify that. Also if we go with their AEC, we might want to get it
plumbed to Pulse, but we would be able to access the monitoring streams. I don't
know if we have checked the quality of their AEC against WebRTC's, though.
[hard]
- Video sources pulled by the compositor
We have an issue where on some platforms, the latency is too high (more than
1000/60ms between callbacks), and we are not able to paint the all the frame
when they are due (we seem to have hacked around it for now). iirc roc has
ideas on how to do that. It should also reduce the video latency.
[no idea]
- MSG optimizations
When randomly profiling Firefox during WebRTC calls/gUM applications/Web Audio
API applications, I see that the MSG is using too much CPU compared to what it
should use. Since the MSG is pretty central to our overall real-time media
story, it should logically be something that we optimize.
For example, I think that investing around week in optimizing the MSG would be
as interesting in CPU usage wins as adding support for an hardware AEC. I have
notes somewhere on things we could do (somewhat low-hanging fruits), and the
CPU usage gain (of course during a specific scenario, since different scenarios
stress the MSG code in different ways).
[not too hard]
- Sandbox hardening
Audio input and output stream are somewhat syscall heavy, and do scary things
with buffers all the time. Depending on the technique the security people are
going to use/are using, we might need to change some code on our side. Because
we have non-traditional requirements on latency, the classic solutions ("just
use ipdl") might not work.
We should look into mutexes and other synchronization primitives
The page I used as a reference, but I'm not sure if it's up to date:
<Sandbox>
[no idea]