Media/multichannel

From MozillaWiki
Jump to: navigation, search

Overview

Current Overall Status

  1. Multiple channel support on Windows, Linux, OS X has finished
  2. Improving mixing modules
  3. Surveying how to play audio 5.1 on Android and how to implement multi-channels on it
  4. Surveying a good way to add telemetry to collect use case of audio 5.1

Roadmap

There are three phases for the implementation.

  1. Basic support
    1. Implement basic support for each backends
      1. Implement multiple channel support in cubeb on Windows
      2. Support multiple channel for Firefox Windows based on previous cubeb's work
      3. Repeat the work for OSX, Linux, ... and other platforms.
    2. Implement mixing module beyond backends
      1. Implement the defined mixing mechanisms(e.g., 3f2 downmix)
      2. Design alternative mechanisms for the undefined conversions
    3. Testing: Design a reliable test for multiple channel support(need to integrate with the testing improvement plan)
      1. Fake a virtual audio device on Windows
      2. Intercept the audio output data to verify via the faked device.
      3. Repeat the work for OSX, Linux, ... and other platforms.
  2. Add telemetry to collect use case of audio 5.1
    1. Record the information about user's audio devices
    2. Record the channel counts and layout of the raw decoded audio data
  3. Integration with device-switching support(this has not implemented yet)
  4. Fancy feature support
    1. stereo-to-5.1 simulation

Integration of cubeb and gecko

The Gecko's AudioConverter downmix the audio data which has more than two channels into stereo or mono before passing it to cubeb. The reason to do that is because cubeb has no multi-channel support. Once cubeb implements multi-channel support and mixing mechanism, there is no need to downmix first in gecko.

We should move the downmix mechanism from gecko into cubeb one by one on each platform. If we implement the multi-channel support on Windows, then we should disallow the downmix mechanism in gecko on Windows and let cubeb to do that job. The downmix in gecko will be removed until the multi-channel support is implemented on each backend.

Integration with third-party library

Cubeb relies on third-party audio library. To support multiple channel, we found there are something we could contribute to them.

  • PulseAudio: Add 5.1-to-stereo downmixing mechanism
  • speexdsp: Fix warning(cast between unsigned and signed) in resampler.c

Timeline

Evaluate dates here.

Status tracking

Cubeb

Full Query
ID Summary Product Component Resolution Assigned to Depends on Blocks Whiteboard Target milestone
1300024 Support audio 5.1 on Android Core Audio/Video: cubeb C.M.Chang[:chunmin] 1073786, 1286101 ---
1325023 Add more channel layout to cubeb Core Audio/Video: cubeb 1368938 1073786 ---
1349474 Support more AudioChannelLayoutTag than kAudioChannelLayoutTag_UseChannelDescriptions in cubeb_audiounit Core Audio/Video: cubeb C.M.Chang[:chunmin] 1073786 ---
1368938 Support audio 5.1 with non-SMPTE layout devices Core Audio/Video: cubeb Paul Adenot (:padenot) 1619726 1073786, 1325023 ---
1474175 Wrong audio channel selection on multi channel audio interfaces (MacOS) Core Audio/Video: Playback C.M.Chang[:chunmin] 1073786 ---
1619726 Use the new Rust audio mixer in cubeb windows Core Audio/Video: cubeb C.M.Chang[:chunmin] 1073786, 1368938 ---
1627827 Audio goes wrong channel with Motu 828 mkii Core Audio/Video: cubeb C.M.Chang[:chunmin] 1628132 1073786 [media-audio] ---

7 Total; 7 Open (100%); 0 Resolved (0%); 0 Verified (0%);


Playback

Full Query
ID Summary Product Component Resolution Assigned to Depends on Blocks Whiteboard Target milestone
1073786 [meta] Multiple channel support for Cubeb Core Audio/Video: cubeb C.M.Chang[:chunmin] 1300024, 1325023, 1349474, 1368938, 1474175, 1619726, 1627827, 1300018, 1300021, 1300023, 1318628, 1339723, 1343788, 1552928 1300455 ---
1287672 Add telemetry for 5.1 audio Core Audio/Video: Playback C.M.Chang[:chunmin] 1339362 1300455 ---

2 Total; 2 Open (100%); 0 Resolved (0%); 0 Verified (0%);


Multiple Channel Support

Supported channels

To support more channels beyond Stereo(who has left and right channels), we need to define what channels we use:

Code Channel Name
M Mono
L Left(Front Left)
R Right(Front Right)
C Center(Front Center)
LS Left Surround(Side Left)
RS Right Surround(Side Right)
RLS Rear Left Surround(Back Left)
RC Rear Center(Back Center)
RRS Rear Right Surround(Bake Right)
LFE Low Frequency Effects

Channel Layout

Channel layout specifies the order of input/output channel data in audio buffer. For example, if the layout is stereo, then we have two channel data. The first data is for left channel, the second one is right channel. Channel layout has various definitions, but SMPTE's format is most common:

Name Channels
DUAL-MONO L R
DUAL-MONO-LFE L R LFE
MONO M
MONO-LFE M LFE
STEREO L R
STEREO-LFE L R LFE
3F L R C
3F-LFE L R C LFE
2F1 L R RC
2F1-LFE L R LFE RC
3F1 L R C RC
3F1-LFE L R C LFE RC
2F2 L R LS RS
2F2-LFE L R LFE LS RS
3F2 L R C LS RS
3F2-LFE L R C LFE LS RS
3F3R-LFE L R C LFE RC LS RS
3F4-LFE L R C LFE RLS RRS LS RS

Mixing

When the number of audio input channels is different from the number of audio output channels, we need to convert the audio input data to fit the audio output's configuration.

Downmix

When input channels is larger than output channels, we need to compress the audio input data. The conversion is called downmix(downward mixing). The table 2 in ITU-R BS.775-3 defines equations to convert audio from 3F2 to 1F(mono), 2F(stereo), 3F, 2F1, 3F1 and 2F2. We can simply add a LFE value to expand the downmix matrix. It will allow it to convert audio from 3F2-LFE(5.1 surround sound) to 1F(mono), 2F(stereo), 3F, 2F1, 3F1 and 2F2 and their LFEs. Here is the code example.

Upmix

When input channels is smaller than output channels, we need to expand the audio input data.

Bypass

When input channels is equal to output channels but they have different layout, we need to bypass the audio input data.

Mixing Policy

There are three mechanisms for mixing:

  1. Specific conversion
  2. Mapping data by channel name
  3. Bypassing data by channel index(fallback plan)

Each time when we try to upmix or downmix, we need to try converting data with the above order. That is, we try specific conversion first. If it works, then the job is done. Otherwise, we next try mixing by mapping the channel. If it still doesn't work, then we try mixing by bypassing the channel data. The final mechanism should be our fallback plan and it should always work.

Specific conversion

Some conversion has its own definition, so we need to implement this. For example, The table 2 in ITU-R BS.775-3 defines the downmix equations from 3F2 to 1F, 2F, 3F, 2F1, 3F1 and 2F2.

Mapping data by channel name

In most cases, the input and output data can be mapped by its layout setting. For example, if we try downmixing from 3F(L, R, C) to Stereo(L, R), we only need to pass the first two input channel data to output.

Bypassing data by channel index(fallback plan)

Some cases are not covered in the above mechanisms. The downmix from stereo(L, R) to mono(M) is an example. There is no spec and there is no matched channel for this conversion. Especially, WASAPI can support some unmatched speaker settings like 6 channels with stereo layout(stereo should only has 2 channels). We need make a fallback plan for such cases.

The simplest plan is to convert data by following its channel numbers. If the input has 2 channels and output has 1 channel, then we can just pass the first data to the output and ignore the other. We just need to pass the channel data by channel index.

An alternative way is to define some matrices to compress/expand the audio data. However, the combination is too large, it's nearly impossible to define the matrices for them all. A feasible way is to define some matrices for partial common cases like 8~3 to 2 channels.

Testing

If we could fake a virtual audio device on each platform, we could intercept the output data to verify (through the faked devices) and it will be possible to test all layouts. MSDN has some article about Virtual Audio Devices. Microsoft/Windows-driver-samples might be a reference. The testing discussion can be found here

The advantages for faking layouts:

  1. Check whether the sound is really playing with the right channels
  2. Check the mixed audio is correct

References