Media/WebAudio: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Page generated from https://etherpad.mozilla.org/2015-audio-input-output-msg-roadmap by Special:ImportFromEtherpad)
 
(update summary)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
# 2015 Audio {output,input}/MSG roadmap


In no particular order: some tasks are blocked by other things (work from other<br />
teams, spec, etc.), some we can start doing now.


Some comments about the expected difficulty and whether it's blocked is at the<br />
'''Web Audio Perf Parity meta bug:''' https://bugzilla.mozilla.org/show_bug.cgi?id=webaudioperf_parity <br/>
end of each point in [brackets].
'''All open Web Audio bugs:''' http://mzl.la/1HQZkmU <br/>
'''NOTE:''' If a bug doesn't have a priority (P1 to P5), we are not planning to fix it by the end of Q3.  If anyone thinks it needs to be fixed, please ping padenot and/or mreavy.<br/>
'''WebRTC and Web Audio plans coming out of Whistler:'''  https://docs.google.com/document/d/1eJLEenV4T5R5uiattNXU4PIy9w7hUvysj3lvkZPJVIM/edit#heading=h.hit4o9naa62o  -- This wiki will be updated to reflect the new info in this doc
----------------------
== Web Audio API performance improvement, phase 2 (Q3) ==
* Firefox 44 is our target release for "improved web audio performance" (as good or better than our competition)
** Web Audio perf parity bug, everything higher than Rank=14, is the target: {{Bug|1189514}}
** Benchmarks: http://ouija.allizom.org/grafana/index.html#/dashboard/file/webaudio.json 
*** (Mac and Android benchmarks are having trouble running in automation.  Investigating why with dminor.)


## Web Audio API suspend/resume


Gaia is hacking around the lack of this API for now, but we need it, there has<br />
----------------------
been around 3-4 high-profile battery consumption regression because of that.<br />
== Web Audio API performance improvement, phase 1 (Q2) ==
The spec is now finished, so we are good to implement.


[not too hard]
'''Done''':
* {{bug|1050645}} - We now have automatic benchmarks running on CI, on windows, mac and linux, where we benchmark our performance against Chrome. Unfortunately, this was put in place after some optimization landed, so we don't see those improvements. We need to have it running on mobile, it should work but does not.
* {{bug|1140448}} - AudioParam is now very fast. This is very apparent in benchmarks
* {{bug|926838}} - The FFT code is now at least twice as fast on ARM. Same, obvious in benchmark, or simply running any app that uses a ConvolverNode
* {{bug|1140450}} - Resampling complexity has been lowered. This is a 50% decrease in complexity, and leads to wins accross the a number of benchmarks. Very obvious performance increase on the benchmarks. The students working on this are currently exploring ways of making it even faster. This is optimizing one of the most popular AudioNode of the Web Audio API.
* Various other optimizations of the internal MediaStreamGraph code, leading to less visible optimizations
* {{bug|1127188}} - when a document goes away (for example when the a page is reloaded), AudioContext are now aggressively prevented from doing any further processing, improving the experience for developers that often reload their page, and saving resources when navigating away from pages with Web Audio API code
* {{bug|1157137}} - ScriptProcessorNode would sometimes have a bug where internal latency would build up
* {{bug|1169321}} - We need the benchmarks to run on android, there is a bug preventing us to see the results
* {{bug|1157768}} - This is the same as 926838, but for x86, optimizing the most expensive node of the Web Audio API.
* {{bug|1140450}} - This is still not closed, the team is exploring alternative ways to be even more efficient. (This is a P2; so it's not a "must-have" for our first phase improvements, but we still expect to land it in Fx 42.)


## Possible audio output device switch fixes for Windows


We need to add some code to handle audio output device switching in<br />
windows/WASAPI. (This might be done by the end of 2014). This is driver/windows<br />
version dependent, so some testing is needed beforehand.


[not too hard]


## Web Audio API Audio Worker


This is _very_ important to implement quickly and right when the spec is done.<br />
----------------------
I've got famous people ready to do crazy demos for us when we have it ready.


[hard, blocked on spec]
== 2015 Web Audio ({output,input},MSG) Development Plan ==


## Audio input and output code consolidation
Some comments about the expected difficulty and whether it's blocked is at the end of each point in [brackets].


These days, our audio input and our audio output code are two completely<br />
* Web Audio API Audio Worker
different code bases: the audio input in buried down in webrtc-land, and the<br />
** This is _very_ important to implement quickly and right when the spec is done.
audio output code is in `libcubeb`.
** We've got famous people ready to do crazy demos for us when we have it ready.<br/>
 
** [hard, need last clarifications from the spec, which should happen soon]
This cause a number of issues (ranked in least problematic to most important):<br />
* Audio input and output code consolidation
- We need to carry patches on top of the `webrtc.org` codebase to make it fit our<br />
** These days, our audio input and our audio output code are two completely different code bases: the audio input in buried down in webrtc-land, and the audio output code is in `libcubeb`.
need<br />
** This cause a number of issues (ranked in least problematic to most important):<br />
- We don't know the code base well (certainly not as well as something we'd have<br />
*** We need to carry patches on top of the `webrtc.org` codebase to make it fit our needs<br />
written)<br />
*** We don't know the code base well (certainly not as well as something we'd have written)<br />
- It's a bit harder (in terms of plumbing) to use platform AECs (we know OSX's<br />
*** It's a bit harder (in terms of plumbing) to use platform AECs (we know OSX's  is not great, but it could become better)<br />
  is not great, but it could become better)<br />
*** It's hard to recover from over/under-run, and communicate clear and up-to-date timing and latency figures to the AEC.<br />
- It's hard to recover from over/under-run, and communicate clear and up-to-date<br />
*** There are issues on certain platforms (osx) where when you don't have full-duplex input and outputs and change output device, the output callback does not get called and then you get a loop-back latency buildup (this is important for gUM + Web Audio, and to make sure the AEC still works optimally  without having a massive buffer). This same problem might exist on other platforms, notably certain combination of windows hardware + version
timing and latency figures to the AEC.<br />
***  We can't do full-duplex audio streams (input + output in the same callback),  so:<br />
- There are issues on certain platforms (osx) where when you don't have<br />
*** We miss latency and performance optimizations:<br />
full-duplex input and outputs and change output device, the output callback<br />
*** For loopback: gUM -&gt; Web Audio API -&gt; speakers, which is becoming rather common<br />
does not get called and then you get a loop-back latency buildup (this is<br />
*** In general, because full-duplex means only one IPC round-trip between the audio client (Firefox) and the server (pulse/wasapi/etc.).<br />
important for gUM + Web Audio, and to make sure the AEC still works optimally<br />
*** We can't have perfect clock correlation between input and output, which is very problematic for the AEC (it works now, it could be much better)
  without having a massive buffer). This same problem might exist on other<br />
**This is quite some work, but will have great benefits. It would require writing the input side of `libcubeb`, and ditching the webrtc `audio_device` module. I'd rather do that while WebRTC is not too mainstream because it's big and dangerous, but this looks more difficult in terms of timeline every day, the window is closing fast. There is no other way and we will have to do it at some point, though. <br />
platforms, notably certain combination of windows hardware + version<br />
** [quite hard and sizeable]  
- We can't do full-duplex audio streams (input + output in the same callback),<br />
* Monitoring stream feedback for AEC
  so:<br />
** Some platforms (PulseAudio, Windows) allow a process to access the output of the system mixer. Feeding that back to the AEC would be obviously better that using the output of the MSG mixer.
  - We miss latency and performance optimizations:<br />
** [not too hard]
  - For loopback: gUM -&gt; Web Audio API -&gt; speakers, which is becoming rather<br />
* Audio devices selection
common<br />
** The W3C is in the process of doing an API to let the user choose what will be the device for playback and recording on the various API on the Web platform (`HTMLMediaElement`, Web Audio API, gUM).
  - In general, because full-duplex means only one IPC round-trip between the<br />
** While we have some code for the audio input device enumeration, we have no code whatsoever for the output side (we always imply that we want to use the default device). Also it would be good to have pairing feature for input and output devices (e.g. matching the headset mic with the headset headphones).
audio client (Firefox) and the server (pulse/wasapi/etc.).<br />
** This needs more spec work before we can expose anything to authors, but is a lot of platform-specific and plumbing work.
- We can't have perfect clock correlation between input and output, which is<br />
** [medium, need platform specific work, can be parallelized, somewhat blocked on spec]
very problematic for the AEC (it works now, it could be much better)
* Multiple MSG per process
 
** (Related to the previous point about audio device selection)<br />
This is quite some work, but will have great benefits. It would require writing<br />
** If we let authors choose the audio output device, and because the MSG is driven by the audio callbacks, we will need to write code to make sure MSG can communicate with each other (because you can connect multiple AudioContext together using MediaStreams).
the input side of `libcubeb`, and ditching the webrtc `audio_device` module. I'd<br />
** This will also be necessary to implement the upcoming &quot;deep-buffer&quot; option that is arriving to Web Audio API, and that will be very useful to save battery.
rather do that while WebRTC is not too mainstream because it's big and dangerous,<br />
**[medium]<br />
but this looks more difficult in terms of timeline every day, the window is<br />
* PulseAudio on Firefox OS
closing fast. There is no other way and we will have to do it at some point,<br />
** We should really look (again) into it. I know `mwu` has a proof-of-concept, I also know one of the maintainers of PulseAudio, Arun Raghavan (FordPrefect on irc) (who conveniently idles in ##media and went as far as fixing Firefox bugs for us)  
though.
*** has tried it with great success:<br />
 
*** &lt;[http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/ http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/]&gt;<br /> (audioflinger is the internal name of the Android audio stack)<br />
[quite hard and sizeable]
*** &lt;[http://arunraghavan.net/2012/04/pulseaudio-on-android-part-2/ http://arunraghavan.net/2012/04/pulseaudio-on-android-part-2/]&gt;
 
** As you see, CPU usage, power usage, latency are _way_ better using Pulse (at the expense of 400kB of memory on the client, while saving 3.7MB on the server). We should redo those measurements to make sure.
## Monitoring stream feedback for AEC
** Historically, PulseAudio has been used on Nokia phones successfully in the past. Downsides/possible issues: This would require talking to Qualcomm and getting them to certify that. Also if we go with their AEC, we might want to get it plumbed to Pulse, but we would be able to access the monitoring streams. I don't know if we have checked the quality of their AEC against WebRTC's, though.
 
** [hard]<br />
Some platforms (PulseAudio, Windows) allow a process to access the output of the<br />
* Video sources pulled by the compositor
system mixer. Feeding that back to the AEC would be obviously better that using<br />
** We have an issue where on some platforms, the latency is too high (more than 1000/60ms between callbacks), and we are not able to paint the all the frame when they are due (we seem to have hacked around it for now). iirc roc has ideas on how to do that. It should also reduce the video latency.
the output of the MSG mixer.
** [no idea of difficulty yet -- thoughts welcome]
 
* Sandbox hardening
[not too hard]
** Audio input and output stream are somewhat syscall heavy, and do scary things with buffers all the time. Depending on the technique the security people are going to use/are using, we might need to change some code on our side. Because we have non-traditional requirements on latency, the classic solutions (&quot;just use ipdl&quot;) might not work.  
 
** We should look into mutexes and other synchronization primitives
## Audio devices selection
** The page we used as a reference, but not sure if it's up to date: &lt;[[Sandbox]]&gt;
 
* MSG optimizations
The W3C is in the process of doing an API to let the user choose what will be<br />
** Remove blocking from MSG
the device for playback and recording on the various API on the Web platform<br />
** [Landed in Fx 43].<br />
(`HTMLMediaElement`, Web Audio API, gUM).
*  Web Audio API suspend/resume
 
** Gaia is hacking around the lack of this API for now, but we need it, there has been around 3-4 high-profile battery consumption regression because of that.<br />
While we have some code for the audio input device enumeration, we have no code<br />
** The spec is now finished, so we are good to implement.
whatsoever for the output side (we always imply that we want to use the default<br />
** [Landed in Fx 40]
device). Also it would be good to have pairing feature for input and output<br />
* Possible audio output device switch fixes for Windows
devices (e.g. matching the headset mic with the headset headphones).
** We need to add some code to handle audio output device switching in windows/WASAPI. This is driver/windows version dependent, so some testing is needed beforehand.  
 
** [Landed in Fx 38]
This needs more spec work before we can expose anything to authors, but is a lot<br />
of platform-specific and plumbing work.
 
[medium, need platform specific work, can be parallelized, somewhat blocked on<br />
spec]
 
## Multiple MSG per process
 
(Related to the previous point about audio device selection)
 
If we let authors choose the audio output device, and because the MSG is driven<br />
by the audio callbacks, we will need to write code to make sure MSG can<br />
communicate with each other (because you can connect multiple AudioContext<br />
together using MediaStreams).
 
This will also be necessary to implement the upcoming &quot;deep-buffer&quot; option that is<br />
arriving to Web Audio API, and that will be very useful to save battery.
 
[medium]
 
## PulseAudio on Firefox OS
 
We should really look (again) into it. I know `mwu` has a proof-of-concept, I also<br />
know one of the maintainers of PulseAudio, Arun Raghavan (FordPrefect on irc)<br />
(who conveniently idles in ##media and went as far as fixing Firefox bugs for us)<br />
has tried it with great success:
 
- &lt;[http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/ http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/]&gt;<br />
(audioflinger is the internal name of the Android audio stack)<br />
- &lt;[http://arunraghavan.net/2012/04/pulseaudio-on-android-part-2/ http://arunraghavan.net/2012/04/pulseaudio-on-android-part-2/]&gt;
 
As you see, CPU usage, power usage, latency are _way_ better using Pulse (at the<br />
expense of 400kB of memory on the client, while saving 3.7MB on the server). We<br />
should redo those measurements to make sure.
 
Historically, PulseAudio has been used on Nokia phones successfully in the past.
 
Downsides/possible issues: This would require talking to Qualcomm and getting<br />
them to certify that. Also if we go with their AEC, we might want to get it<br />
plumbed to Pulse, but we would be able to access the monitoring streams. I don't<br />
know if we have checked the quality of their AEC against WebRTC's, though.
 
[hard]
 
## Video sources pulled by the compositor
 
We have an issue where on some platforms, the latency is too high (more than<br />
1000/60ms between callbacks), and we are not able to paint the all the frame<br />
when they are due (we seem to have hacked around it for now). iirc roc has<br />
ideas on how to do that. It should also reduce the video latency.
 
[no idea]
 
## MSG optimizations
 
When randomly profiling Firefox during WebRTC calls/gUM applications/Web Audio<br />
API applications, I see that the MSG is using too much CPU compared to what it<br />
should use. Since the MSG is pretty central to our overall real-time media<br />
story, it should logically be something that we optimize.
 
For example, I think that investing around week in optimizing the MSG would be<br />
as interesting in CPU usage wins as adding support for an hardware AEC. I have<br />
notes somewhere on things we could do (somewhat low-hanging fruits), and the<br />
CPU usage gain (of course during a specific scenario, since different scenarios<br />
stress the MSG code in different ways).
 
[not too hard]
 
## Sandbox hardening
 
Audio input and output stream are somewhat syscall heavy, and do scary things<br />
with buffers all the time. Depending on the technique the security people are<br />
going to use/are using, we might need to change some code on our side. Because<br />
we have non-traditional requirements on latency, the classic solutions (&quot;just<br />
use ipdl&quot;) might not work.
 
We should look into mutexes and other synchronization primitives
 
The page I used as a reference, but I'm not sure if it's up to date:<br />
&lt;[[Sandbox]]&gt;
 
[no idea]

Latest revision as of 15:10, 23 September 2015


Web Audio Perf Parity meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=webaudioperf_parity
All open Web Audio bugs: http://mzl.la/1HQZkmU
NOTE: If a bug doesn't have a priority (P1 to P5), we are not planning to fix it by the end of Q3. If anyone thinks it needs to be fixed, please ping padenot and/or mreavy.
WebRTC and Web Audio plans coming out of Whistler: https://docs.google.com/document/d/1eJLEenV4T5R5uiattNXU4PIy9w7hUvysj3lvkZPJVIM/edit#heading=h.hit4o9naa62o -- This wiki will be updated to reflect the new info in this doc


Web Audio API performance improvement, phase 2 (Q3)



Web Audio API performance improvement, phase 1 (Q2)

Done:

  • bug 1050645 - We now have automatic benchmarks running on CI, on windows, mac and linux, where we benchmark our performance against Chrome. Unfortunately, this was put in place after some optimization landed, so we don't see those improvements. We need to have it running on mobile, it should work but does not.
  • bug 1140448 - AudioParam is now very fast. This is very apparent in benchmarks
  • bug 926838 - The FFT code is now at least twice as fast on ARM. Same, obvious in benchmark, or simply running any app that uses a ConvolverNode
  • bug 1140450 - Resampling complexity has been lowered. This is a 50% decrease in complexity, and leads to wins accross the a number of benchmarks. Very obvious performance increase on the benchmarks. The students working on this are currently exploring ways of making it even faster. This is optimizing one of the most popular AudioNode of the Web Audio API.
  • Various other optimizations of the internal MediaStreamGraph code, leading to less visible optimizations
  • bug 1127188 - when a document goes away (for example when the a page is reloaded), AudioContext are now aggressively prevented from doing any further processing, improving the experience for developers that often reload their page, and saving resources when navigating away from pages with Web Audio API code
  • bug 1157137 - ScriptProcessorNode would sometimes have a bug where internal latency would build up
  • bug 1169321 - We need the benchmarks to run on android, there is a bug preventing us to see the results
  • bug 1157768 - This is the same as 926838, but for x86, optimizing the most expensive node of the Web Audio API.
  • bug 1140450 - This is still not closed, the team is exploring alternative ways to be even more efficient. (This is a P2; so it's not a "must-have" for our first phase improvements, but we still expect to land it in Fx 42.)




2015 Web Audio ({output,input},MSG) Development Plan

Some comments about the expected difficulty and whether it's blocked is at the end of each point in [brackets].

  • Web Audio API Audio Worker
    • This is _very_ important to implement quickly and right when the spec is done.
    • We've got famous people ready to do crazy demos for us when we have it ready.
    • [hard, need last clarifications from the spec, which should happen soon]
  • Audio input and output code consolidation
    • These days, our audio input and our audio output code are two completely different code bases: the audio input in buried down in webrtc-land, and the audio output code is in `libcubeb`.
    • This cause a number of issues (ranked in least problematic to most important):
      • We need to carry patches on top of the `webrtc.org` codebase to make it fit our needs
      • We don't know the code base well (certainly not as well as something we'd have written)
      • It's a bit harder (in terms of plumbing) to use platform AECs (we know OSX's is not great, but it could become better)
      • It's hard to recover from over/under-run, and communicate clear and up-to-date timing and latency figures to the AEC.
      • There are issues on certain platforms (osx) where when you don't have full-duplex input and outputs and change output device, the output callback does not get called and then you get a loop-back latency buildup (this is important for gUM + Web Audio, and to make sure the AEC still works optimally without having a massive buffer). This same problem might exist on other platforms, notably certain combination of windows hardware + version
      • We can't do full-duplex audio streams (input + output in the same callback), so:
      • We miss latency and performance optimizations:
      • For loopback: gUM -> Web Audio API -> speakers, which is becoming rather common
      • In general, because full-duplex means only one IPC round-trip between the audio client (Firefox) and the server (pulse/wasapi/etc.).
      • We can't have perfect clock correlation between input and output, which is very problematic for the AEC (it works now, it could be much better)
    • This is quite some work, but will have great benefits. It would require writing the input side of `libcubeb`, and ditching the webrtc `audio_device` module. I'd rather do that while WebRTC is not too mainstream because it's big and dangerous, but this looks more difficult in terms of timeline every day, the window is closing fast. There is no other way and we will have to do it at some point, though.
    • [quite hard and sizeable]
  • Monitoring stream feedback for AEC
    • Some platforms (PulseAudio, Windows) allow a process to access the output of the system mixer. Feeding that back to the AEC would be obviously better that using the output of the MSG mixer.
    • [not too hard]
  • Audio devices selection
    • The W3C is in the process of doing an API to let the user choose what will be the device for playback and recording on the various API on the Web platform (`HTMLMediaElement`, Web Audio API, gUM).
    • While we have some code for the audio input device enumeration, we have no code whatsoever for the output side (we always imply that we want to use the default device). Also it would be good to have pairing feature for input and output devices (e.g. matching the headset mic with the headset headphones).
    • This needs more spec work before we can expose anything to authors, but is a lot of platform-specific and plumbing work.
    • [medium, need platform specific work, can be parallelized, somewhat blocked on spec]
  • Multiple MSG per process
    • (Related to the previous point about audio device selection)
    • If we let authors choose the audio output device, and because the MSG is driven by the audio callbacks, we will need to write code to make sure MSG can communicate with each other (because you can connect multiple AudioContext together using MediaStreams).
    • This will also be necessary to implement the upcoming "deep-buffer" option that is arriving to Web Audio API, and that will be very useful to save battery.
    • [medium]
  • PulseAudio on Firefox OS
    • We should really look (again) into it. I know `mwu` has a proof-of-concept, I also know one of the maintainers of PulseAudio, Arun Raghavan (FordPrefect on irc) (who conveniently idles in ##media and went as far as fixing Firefox bugs for us)
    • As you see, CPU usage, power usage, latency are _way_ better using Pulse (at the expense of 400kB of memory on the client, while saving 3.7MB on the server). We should redo those measurements to make sure.
    • Historically, PulseAudio has been used on Nokia phones successfully in the past. Downsides/possible issues: This would require talking to Qualcomm and getting them to certify that. Also if we go with their AEC, we might want to get it plumbed to Pulse, but we would be able to access the monitoring streams. I don't know if we have checked the quality of their AEC against WebRTC's, though.
    • [hard]
  • Video sources pulled by the compositor
    • We have an issue where on some platforms, the latency is too high (more than 1000/60ms between callbacks), and we are not able to paint the all the frame when they are due (we seem to have hacked around it for now). iirc roc has ideas on how to do that. It should also reduce the video latency.
    • [no idea of difficulty yet -- thoughts welcome]
  • Sandbox hardening
    • Audio input and output stream are somewhat syscall heavy, and do scary things with buffers all the time. Depending on the technique the security people are going to use/are using, we might need to change some code on our side. Because we have non-traditional requirements on latency, the classic solutions ("just use ipdl") might not work.
    • We should look into mutexes and other synchronization primitives
    • The page we used as a reference, but not sure if it's up to date: <Sandbox>
  • MSG optimizations
    • Remove blocking from MSG
    • [Landed in Fx 43].
  • Web Audio API suspend/resume
    • Gaia is hacking around the lack of this API for now, but we need it, there has been around 3-4 high-profile battery consumption regression because of that.
    • The spec is now finished, so we are good to implement.
    • [Landed in Fx 40]
  • Possible audio output device switch fixes for Windows
    • We need to add some code to handle audio output device switching in windows/WASAPI. This is driver/windows version dependent, so some testing is needed beforehand.
    • [Landed in Fx 38]