Media/multichannel
Overview
Current Overall Status
- Multiple channel support on Windows, Linux, OS X has finished
- Improving mixing modules
- Surveying how to play audio 5.1 on Android and how to implement multi-channels on it
- Surveying a good way to add telemetry to collect use case of audio 5.1
Roadmap
There are three phases for the implementation.
- Basic support
- Implement basic support for each backends
- Implement multiple channel support in cubeb on Windows
- Support multiple channel for Firefox Windows based on previous cubeb's work
- Repeat the work for OSX, Linux, ... and other platforms.
- Implement mixing module beyond backends
- Implement the defined mixing mechanisms(e.g., 3f2 downmix)
- Design alternative mechanisms for the undefined conversions
- Testing: Design a reliable test for multiple channel support(need to integrate with the testing improvement plan)
- Fake a virtual audio device on Windows
- Intercept the audio output data to verify via the faked device.
- Repeat the work for OSX, Linux, ... and other platforms.
- Implement basic support for each backends
- Add telemetry to collect use case of audio 5.1
- Record the information about user's audio devices
- Record the channel counts and layout of the raw decoded audio data
- Integration with device-switching support(this has not implemented yet)
- Fancy feature support
- stereo-to-5.1 simulation
Integration of cubeb and gecko
The Gecko's AudioConverter downmix the audio data which has more than two channels into stereo or mono before passing it to cubeb. The reason to do that is because cubeb has no multi-channel support. Once cubeb implements multi-channel support and mixing mechanism, there is no need to downmix first in gecko.
We should move the downmix mechanism from gecko into cubeb one by one on each platform. If we implement the multi-channel support on Windows, then we should disallow the downmix mechanism in gecko on Windows and let cubeb to do that job. The downmix in gecko will be removed until the multi-channel support is implemented on each backend.
Integration with third-party library
Cubeb relies on third-party audio library. To support multiple channel, we found there are something we could contribute to them.
- PulseAudio: Add 5.1-to-stereo downmixing mechanism
- speexdsp: Fix warning(cast between unsigned and signed) in resampler.c
Timeline
Evaluate dates here.
Status tracking
Cubeb
ID | Summary | Product | Component | Resolution | Assigned to | Depends on | Blocks | Whiteboard | Target milestone |
---|---|---|---|---|---|---|---|---|---|
1300024 | Support audio 5.1 on Android | Core | Audio/Video: cubeb | C.M.Chang[:chunmin] | 1073786, 1286101 | --- | |||
1368938 | Support audio 5.1 with non-SMPTE layout devices | Core | Audio/Video: cubeb | Paul Adenot (:padenot) | 1619726 | 1073786, 1325023 | --- | ||
1474175 | Wrong audio channel selection on multi channel audio interfaces (MacOS) | Core | Audio/Video: Playback | C.M.Chang[:chunmin] | 1073786 | --- | |||
1619726 | Use the new Rust audio mixer in cubeb windows | Core | Audio/Video: cubeb | C.M.Chang[:chunmin] | 1073786, 1368938 | --- | |||
1627827 | Audio goes wrong channel with Motu 828 mkii | Core | Audio/Video: cubeb | C.M.Chang[:chunmin] | 1628132 | 1073786 | [media-audio] | --- |
5 Total; 5 Open (100%); 0 Resolved (0%); 0 Verified (0%);
Playback
ID | Summary | Product | Component | Resolution | Assigned to | Depends on | Blocks | Whiteboard | Target milestone |
---|---|---|---|---|---|---|---|---|---|
1073786 | [meta] Multiple channel support for Cubeb | Core | Audio/Video: cubeb | C.M.Chang[:chunmin] | 1300024, 1368938, 1474175, 1619726, 1627827, 1300018, 1300021, 1300023, 1318628, 1325023, 1339723, 1343788, 1349474, 1552928 | 1300455 | --- | ||
1287672 | Add telemetry for 5.1 audio | Core | Audio/Video: Playback | C.M.Chang[:chunmin] | 1339362 | 1300455 | --- |
2 Total; 2 Open (100%); 0 Resolved (0%); 0 Verified (0%);
Multiple Channel Support
Supported channels
To support more channels beyond Stereo(who has left and right channels), we need to define what channels we use:
Code | Channel Name |
---|---|
M | Mono |
L | Left(Front Left) |
R | Right(Front Right) |
C | Center(Front Center) |
LS | Left Surround(Side Left) |
RS | Right Surround(Side Right) |
RLS | Rear Left Surround(Back Left) |
RC | Rear Center(Back Center) |
RRS | Rear Right Surround(Bake Right) |
LFE | Low Frequency Effects |
Channel Layout
Channel layout specifies the order of input/output channel data in audio buffer. For example, if the layout is stereo, then we have two channel data. The first data is for left channel, the second one is right channel. Channel layout has various definitions, but SMPTE's format is most common:
Name | Channels | |||||||
---|---|---|---|---|---|---|---|---|
DUAL-MONO | L | R | ||||||
DUAL-MONO-LFE | L | R | LFE | |||||
MONO | M | |||||||
MONO-LFE | M | LFE | ||||||
STEREO | L | R | ||||||
STEREO-LFE | L | R | LFE | |||||
3F | L | R | C | |||||
3F-LFE | L | R | C | LFE | ||||
2F1 | L | R | RC | |||||
2F1-LFE | L | R | LFE | RC | ||||
3F1 | L | R | C | RC | ||||
3F1-LFE | L | R | C | LFE | RC | |||
2F2 | L | R | LS | RS | ||||
2F2-LFE | L | R | LFE | LS | RS | |||
3F2 | L | R | C | LS | RS | |||
3F2-LFE | L | R | C | LFE | LS | RS | ||
3F3R-LFE | L | R | C | LFE | RC | LS | RS | |
3F4-LFE | L | R | C | LFE | RLS | RRS | LS | RS |
Mixing
When the number of audio input channels is different from the number of audio output channels, we need to convert the audio input data to fit the audio output's configuration.
Downmix
When input channels is larger than output channels, we need to compress the audio input data. The conversion is called downmix(downward mixing). The table 2 in ITU-R BS.775-3 defines equations to convert audio from 3F2 to 1F(mono), 2F(stereo), 3F, 2F1, 3F1 and 2F2. We can simply add a LFE value to expand the downmix matrix. It will allow it to convert audio from 3F2-LFE(5.1 surround sound) to 1F(mono), 2F(stereo), 3F, 2F1, 3F1 and 2F2 and their LFEs. Here is the code example.
Upmix
When input channels is smaller than output channels, we need to expand the audio input data.
Bypass
When input channels is equal to output channels but they have different layout, we need to bypass the audio input data.
Mixing Policy
There are three mechanisms for mixing:
- Specific conversion
- Mapping data by channel name
- Bypassing data by channel index(fallback plan)
Each time when we try to upmix or downmix, we need to try converting data with the above order. That is, we try specific conversion first. If it works, then the job is done. Otherwise, we next try mixing by mapping the channel. If it still doesn't work, then we try mixing by bypassing the channel data. The final mechanism should be our fallback plan and it should always work.
Specific conversion
Some conversion has its own definition, so we need to implement this. For example, The table 2 in ITU-R BS.775-3 defines the downmix equations from 3F2 to 1F, 2F, 3F, 2F1, 3F1 and 2F2.
Mapping data by channel name
In most cases, the input and output data can be mapped by its layout setting. For example, if we try downmixing from 3F(L, R, C) to Stereo(L, R), we only need to pass the first two input channel data to output.
Bypassing data by channel index(fallback plan)
Some cases are not covered in the above mechanisms. The downmix from stereo(L, R) to mono(M) is an example. There is no spec and there is no matched channel for this conversion. Especially, WASAPI can support some unmatched speaker settings like 6 channels with stereo layout(stereo should only has 2 channels). We need make a fallback plan for such cases.
The simplest plan is to convert data by following its channel numbers. If the input has 2 channels and output has 1 channel, then we can just pass the first data to the output and ignore the other. We just need to pass the channel data by channel index.
An alternative way is to define some matrices to compress/expand the audio data. However, the combination is too large, it's nearly impossible to define the matrices for them all. A feasible way is to define some matrices for partial common cases like 8~3 to 2 channels.
Testing
If we could fake a virtual audio device on each platform, we could intercept the output data to verify (through the faked devices) and it will be possible to test all layouts. MSDN has some article about Virtual Audio Devices. Microsoft/Windows-driver-samples might be a reference. The testing discussion can be found here
The advantages for faking layouts:
- Check whether the sound is really playing with the right channels
- Check the mixed audio is correct