User:Roc/AudioBufferProposal

Overview

This proposal aims to solve the data-race problem with AudioBuffer's mutable Float32Arrays, in a way that provides a high degree of compatibility with existing API usage, but avoids requiring memory copies in almost all cases (even for code using existing APIs).

Specification Changes

Text in italics is non-normative.

AudioBuffer is extended with one new method:

partial interface AudioBuffer {
  void copyChannelDataTo(long channelNumber, unsigned long start, unsigned long length, Float32Array destination);
}

The copyChannelDataTo method copies a range of samples from the specified channel of the AudioBuffer to the destination array. If start is less than zero, start plus length is greater than the AudioBuffer's length, or length is greater than the destination array's length, an INDEX_SIZE_ERR exception must be thrown.

Note: This method can be used to fill part of an array by passing in a Float32Array that's a view onto the larger array.

Note: When reading data from an AudioBuffer's channels, and the data can be processed in chunks, copyChannelDataTo should be preferred to calling getChannelData and accessing the resulting array, because it may avoid unnecessary memory allocation and copying.

An internal operation "acquire the contents of an AudioBuffer" is invoked when the contents of an AudioBuffer are needed by some API implementation. This operation returns immutable channel data to the invoker. When an "acquire the contents" operation occurs, run the following steps:

If any of the AudioBuffer's ArrayBuffers have been neutered, abort these steps and return zero-length channel data buffers to the invoker.
Neuter all ArrayBuffers for arrays previously returned by getChannelData on this AudioBuffer.
Retain the underlying data buffers from those ArrayBuffers and return references to them to the invoker.
Attach ArrayBuffers containing copies of the data to the AudioBuffer, to be returned by the next call to getChannelData.

Note: These are just the observable behavior. The entire operation can usually be implemented without copying channel data. In particular, the last step should be performed lazily at the next getChannelData call (if there is one; there often won't be). That means a sequence of consecutive "acquire the contents" operations with no intervening getChannelData (e.g. multiple AudioBufferSourceNodes playing the same AudioBuffer) can be implemented with no allocations or copying.

Note: Implementations can perform an additional optimization: if getChannelData is called on an AudioBuffer, fresh ArrayBuffers have not yet been allocated, but all invokers of previous "acquire the contents" operations on an AudioBuffer have stopped using the AudioBuffer's data, the raw data buffers can be recycled for use with new AudioBuffers, avoiding any reallocation or copying of the channel data.

The "acquire the contents of an AudioBuffer" operation is invoked in the following cases:

When AudioBufferSourceNode.start is called, it acquires the contents of the node's buffer. If the operation fails, nothing is played.
When a ConvolutionNode's buffer is set to an AudioBuffer while the node is connected to an output node, or a ConvolutionNode is connected to an output node while the ConvolutionNode's buffer is set to an AudioBuffer, it acquires the contents of the AudioBuffer.
When dispatch of an AudioProcessingEvent completes, it acquires the contents of its outputBuffer.

Note: For AudioBufferSourceNode and ConvolutionNode, a node "acquires the contents of an AudioBuffer" at exactly the moment Jer's proposal would make the buffer immutable by virtue of being associated with that node.

Implementation Sketch

For clarity, here's one way to implement this efficiently. This section is non-normative and would not need to be included in the spec.

Each AudioBuffer is in one of two states: arrays-neutered and arrays-not-neutered. Normally it would start in the arrays-not-neutered state. When in the arrays-neutered state, its ArrayBuffers are neutered and the AudioBuffer holds a reference to its channel data buffers in a ImmutableBufferSet which is thread-safe and can be shared with the parts of the Web Audio implementation that consume AudioBuffer data. When in the arrays-not-neutered state, its ArrayBuffers hold the channel data.

In an "acquire the contents" operation, do the following steps:

If in the arrays-neutered state, return a reference to the ImmutableBufferSet and abort these steps.
If any of the AudioBuffer's ArrayBuffers have been neutered, abort these steps and return zero-length channel data buffers to the invoker.
Neuter the AudioBuffer's ArrayBuffers.
Retain their underlying data buffers and package them into a new ImmutableBufferSet (by reference).
Change to the arrays-neutered state.
Return a reference to the ImmutableBufferSet.

In getChannelData, do the following steps:

If in the arrays-neutered state and the only reference to the ImmutableBufferSet is the AudioBuffer's own reference, create new ArrayBufferss adopting the data from the ImmutableBufferSet and enter the arrays-not-neutered state.
If still in the arrays-neutered state, copy data from the ImmutableBufferSet to form the contents of new AudioBuffers and enter the arrays-not-neutered state.
Proceed as normal.

In copyChannelDataTo, do these steps:

If in the arrays-neutered state, copy data from the ImmutableBufferSet to the destination array.
If in the arrays-not-neutered state, copy data from the appropriate ArrayBuffer to the destination array.

Advocacy

A good implementation following the advice above will not allocate or copy buffers of channel data any more than an implementation of the "freely share memory" proposal, except when an application calls getChannelData on an AudioBuffer that is "in use" ("associated with a live AudioNode", in Jer's proposal). If such an application writes to the returned array, that is deprecated behavior under any proposal, but this proposal defines more predictable results than the "freely share memory" proposal. If such an application only reads the returned array, it can probably be modified to use copyChannelDataTo instead, which will reduce the memory overhead to a negligible level.

This proposal relies heavily on ArrayBuffer neutering. Some people want to avoid the use of neutering, but the TAG declined to endorse that position when asked.

Webkit and Blink developers have indicated that they'll keep around the webkit-prefixed AudioContext API for a long time. This proposal provides substantial compatibility with that API, which has value for authors and browser implementers, especially given the amount of content already written to that API.

I argue that this proposal is no more complicated than other proposals. Compared to Jer's proposal, this proposal has less API surface, and less complexity for Web developers who don't need to read or write channel data after an AudioBuffer has been used, or who are not very concerned about performance (since they don't have to think about node liveness). For developers who are very concerned about performance, and who want to modify channel data between uses of an AudioBuffer, the proposals are very similar; they'll have to think about node liveness. (Jer's proposal throws an exception when writing data to an in-use AudioBuffer, which is informative for Web authors who want to avoid copies; but with my proposal we can provide a similar alert through Web developer tools.) My proposal may be more complicated for implementers --- or not, since Jer's proposal requires implementations to track precisely when an AudioBuffer is associated with a live AudioNode, and this proposal does not.

Compared to the "freely share memory" proposal, assuming that proposal is fully fleshed out to define what authors and implementations are allowed to do, this proposal is almost the same for Web developers. The only extra complexity for Web developers is that they should call copyChannelDataTo to read channel contents instead of reading from getChannelData arrays, if an AudioBuffer could be in use. (Web developers writing to getChannelData arrays should ensure that the AudioBuffer is not in use, under both proposals.) For implementers, the "freely share memory" proposal is probably less complex, although the "ArrayBuffer is neutered while we're playing it" problem must be solved trickily while this proposal solves it easily. (Of course, implementers who don't wish to rely on undefined C++ behavior, or who wish to use non-shared-memory hardware, will have a hard time with the "freely share memory" proposal (or be forced to make copies at inopportune times).)