Media/WebAudio

From MozillaWiki
Jump to: navigation, search


Web Audio Perf Parity meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=webaudioperf_parity
All open Web Audio bugs: http://mzl.la/1HQZkmU
NOTE: If a bug doesn't have a priority (P1 to P5), we are not planning to fix it by the end of Q3. If anyone thinks it needs to be fixed, please ping padenot and/or mreavy.
WebRTC and Web Audio plans coming out of Whistler: https://docs.google.com/document/d/1eJLEenV4T5R5uiattNXU4PIy9w7hUvysj3lvkZPJVIM/edit#heading=h.hit4o9naa62o -- This wiki will be updated to reflect the new info in this doc


Web Audio API performance improvement, phase 2 (Q3)



Web Audio API performance improvement, phase 1 (Q2)

Done:

  • bug 1050645 - We now have automatic benchmarks running on CI, on windows, mac and linux, where we benchmark our performance against Chrome. Unfortunately, this was put in place after some optimization landed, so we don't see those improvements. We need to have it running on mobile, it should work but does not.
  • bug 1140448 - AudioParam is now very fast. This is very apparent in benchmarks
  • bug 926838 - The FFT code is now at least twice as fast on ARM. Same, obvious in benchmark, or simply running any app that uses a ConvolverNode
  • bug 1140450 - Resampling complexity has been lowered. This is a 50% decrease in complexity, and leads to wins accross the a number of benchmarks. Very obvious performance increase on the benchmarks. The students working on this are currently exploring ways of making it even faster. This is optimizing one of the most popular AudioNode of the Web Audio API.
  • Various other optimizations of the internal MediaStreamGraph code, leading to less visible optimizations
  • bug 1127188 - when a document goes away (for example when the a page is reloaded), AudioContext are now aggressively prevented from doing any further processing, improving the experience for developers that often reload their page, and saving resources when navigating away from pages with Web Audio API code
  • bug 1157137 - ScriptProcessorNode would sometimes have a bug where internal latency would build up
  • bug 1169321 - We need the benchmarks to run on android, there is a bug preventing us to see the results
  • bug 1157768 - This is the same as 926838, but for x86, optimizing the most expensive node of the Web Audio API.
  • bug 1140450 - This is still not closed, the team is exploring alternative ways to be even more efficient. (This is a P2; so it's not a "must-have" for our first phase improvements, but we still expect to land it in Fx 42.)




2015 Web Audio ({output,input},MSG) Development Plan

Some comments about the expected difficulty and whether it's blocked is at the end of each point in [brackets].

  • Web Audio API Audio Worker
    • This is _very_ important to implement quickly and right when the spec is done.
    • We've got famous people ready to do crazy demos for us when we have it ready.
    • [hard, need last clarifications from the spec, which should happen soon]
  • Audio input and output code consolidation
    • These days, our audio input and our audio output code are two completely different code bases: the audio input in buried down in webrtc-land, and the audio output code is in `libcubeb`.
    • This cause a number of issues (ranked in least problematic to most important):
      • We need to carry patches on top of the `webrtc.org` codebase to make it fit our needs
      • We don't know the code base well (certainly not as well as something we'd have written)
      • It's a bit harder (in terms of plumbing) to use platform AECs (we know OSX's is not great, but it could become better)
      • It's hard to recover from over/under-run, and communicate clear and up-to-date timing and latency figures to the AEC.
      • There are issues on certain platforms (osx) where when you don't have full-duplex input and outputs and change output device, the output callback does not get called and then you get a loop-back latency buildup (this is important for gUM + Web Audio, and to make sure the AEC still works optimally without having a massive buffer). This same problem might exist on other platforms, notably certain combination of windows hardware + version
      • We can't do full-duplex audio streams (input + output in the same callback), so:
      • We miss latency and performance optimizations:
      • For loopback: gUM -> Web Audio API -> speakers, which is becoming rather common
      • In general, because full-duplex means only one IPC round-trip between the audio client (Firefox) and the server (pulse/wasapi/etc.).
      • We can't have perfect clock correlation between input and output, which is very problematic for the AEC (it works now, it could be much better)
    • This is quite some work, but will have great benefits. It would require writing the input side of `libcubeb`, and ditching the webrtc `audio_device` module. I'd rather do that while WebRTC is not too mainstream because it's big and dangerous, but this looks more difficult in terms of timeline every day, the window is closing fast. There is no other way and we will have to do it at some point, though.
    • [quite hard and sizeable]
  • Monitoring stream feedback for AEC
    • Some platforms (PulseAudio, Windows) allow a process to access the output of the system mixer. Feeding that back to the AEC would be obviously better that using the output of the MSG mixer.
    • [not too hard]
  • Audio devices selection
    • The W3C is in the process of doing an API to let the user choose what will be the device for playback and recording on the various API on the Web platform (`HTMLMediaElement`, Web Audio API, gUM).
    • While we have some code for the audio input device enumeration, we have no code whatsoever for the output side (we always imply that we want to use the default device). Also it would be good to have pairing feature for input and output devices (e.g. matching the headset mic with the headset headphones).
    • This needs more spec work before we can expose anything to authors, but is a lot of platform-specific and plumbing work.
    • [medium, need platform specific work, can be parallelized, somewhat blocked on spec]
  • Multiple MSG per process
    • (Related to the previous point about audio device selection)
    • If we let authors choose the audio output device, and because the MSG is driven by the audio callbacks, we will need to write code to make sure MSG can communicate with each other (because you can connect multiple AudioContext together using MediaStreams).
    • This will also be necessary to implement the upcoming "deep-buffer" option that is arriving to Web Audio API, and that will be very useful to save battery.
    • [medium]
  • PulseAudio on Firefox OS
    • We should really look (again) into it. I know `mwu` has a proof-of-concept, I also know one of the maintainers of PulseAudio, Arun Raghavan (FordPrefect on irc) (who conveniently idles in ##media and went as far as fixing Firefox bugs for us)
    • As you see, CPU usage, power usage, latency are _way_ better using Pulse (at the expense of 400kB of memory on the client, while saving 3.7MB on the server). We should redo those measurements to make sure.
    • Historically, PulseAudio has been used on Nokia phones successfully in the past. Downsides/possible issues: This would require talking to Qualcomm and getting them to certify that. Also if we go with their AEC, we might want to get it plumbed to Pulse, but we would be able to access the monitoring streams. I don't know if we have checked the quality of their AEC against WebRTC's, though.
    • [hard]
  • Video sources pulled by the compositor
    • We have an issue where on some platforms, the latency is too high (more than 1000/60ms between callbacks), and we are not able to paint the all the frame when they are due (we seem to have hacked around it for now). iirc roc has ideas on how to do that. It should also reduce the video latency.
    • [no idea of difficulty yet -- thoughts welcome]
  • Sandbox hardening
    • Audio input and output stream are somewhat syscall heavy, and do scary things with buffers all the time. Depending on the technique the security people are going to use/are using, we might need to change some code on our side. Because we have non-traditional requirements on latency, the classic solutions ("just use ipdl") might not work.
    • We should look into mutexes and other synchronization primitives
    • The page we used as a reference, but not sure if it's up to date: <Sandbox>
  • MSG optimizations
    • Remove blocking from MSG
    • [Landed in Fx 43].
  • Web Audio API suspend/resume
    • Gaia is hacking around the lack of this API for now, but we need it, there has been around 3-4 high-profile battery consumption regression because of that.
    • The spec is now finished, so we are good to implement.
    • [Landed in Fx 40]
  • Possible audio output device switch fixes for Windows
    • We need to add some code to handle audio output device switching in windows/WASAPI. This is driver/windows version dependent, so some testing is needed beforehand.
    • [Landed in Fx 38]