Changes

Audio Data API

9,678 bytes added, 02:51, 26 May 2010

no edit summary

* Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R])

* Yury Delendik

* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])

===== Other Contributors =====

* Ted Mielczarek

* Felipe Gomes

* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])

===== Status =====

'''This is a work in progress.''' This document reflects the current thinking of its authors, and is not an official specification. The original goal of this specification is was to experiment with web audio data on the way to creating a more stable recommendation. ~~It is~~ The authors hoped that this work, and the ideas it ~~generates~~generated, ~~will~~ would eventually find ~~its~~ their way into Mozilla and other HTML5 compatible browsers. Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors. The continuing work on this specification and API can be tracked here, and in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug]. Comments, feedback, and collaboration are all welcome. You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org. ===== Version =====

~~The continuing work on~~ This is the second major version of this ~~specification and~~ API ~~can be tracked~~ (referred to by the developers as audio13)--the previous version is available here. The primary improvements and changes are: * Removal of '''mozSpectrum''' (i.e., ~~and~~ native FFT calculation) -- will be done in ~~Mozilla [https://bugzilla~~JS now.* Added WebGL Arrays (i.~~mozilla~~e.~~org/show_bug~~, fast, typed, native float arrays) for the event framebuffer as well as '''mozWriteAudio()'''.* Native array interfaces instead of using accessors and IDL array arguments.~~cgi?id=490705 bug 490705]~~* No zero padding of audio data occurs anymore. ~~Comments~~All frames are exactly 4096 elements in length.* Added '''mozCurrentSampleOffset()'''* Removed undocumented position/buffer methods on audio element.* Added '''mozChannels''', ~~feedback~~'''mozRate''', ~~and collaboration~~ '''mozFrameBufferLength''' to '''loadedmetadata' event. Demos written for the previous version are ~~all welcome~~'''not''' compatible, though can be made to be quite easily. See details below.

== API Tutorial ==

We have developed a proof of concept, experimental build of Firefox (~~see~~ [[#Obtaining_Code_and_Builds|builds provided below]]) which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and implements the following basic API for reading and writing raw audio data:

===== Reading Audio =====

Audio data is made available via an event-based API. As the audio is played, and therefore decoded, each frame is passed to content scripts for processing ~~before~~ after being written to the audio layer--hence the name, '''AudioWritten'''. Playing, and pausing~~, and stopping~~ the audio all affect the streaming of this raw audio data as well.

~~<code>onaudiowritten="callback(event)~~Consumers of this raw audio data register two callbacks on the <~~"</code>~~audio> or <video> element like in order to consume this data:

<pre>

</pre>

~~<code>mozFrameBuffer</code>~~The '''LoadedMetadata''' event is a standard part of HTML5, and has been extended to provide more detailed information about the audio stream. Specifically, developers can obtain the number of channels and sample rate per second of the audio. This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data. The '''AudioWritten''' event provides two pieces of data. The first is a framebuffer (i.e., an array) containing sample data for the current frame. The second is the time (e.g., milliseconds) for the start of this frame. The following is an example of how both events might be used:

<pre>

var channels, rate, frameBufferLength, samples;

function ~~audioWritten~~audioInfo(event) { ~~samples~~ channels = event.~~mozFrameBuffer~~mozChannels; ~~// sample data is obtained using samples~~rate = event.~~item(n)~~mozRate; frameBufferLength = event.mozFrameBufferLength;

}

~~</pre>~~

function audioWritten(event) { var samples =~~==== Getting FFT Spectrum =====~~ ~~Most data visualizations or other uses of raw audio data begin by calculating a FFT~~event. ~~A pre-calculated FFT is available for each frame of audio decoded.~~ ~~<code>mozSpectrum</code>~~ ~~<pre>~~mozFrameBuffer; var ~~spectrum~~time = event.mozTime;

~~function audioWritten~~ for (~~event) {~~ ~~spectrum~~ var i=0, slen= ~~event~~samples.~~mozSpectrum~~length; i<slen;i++) { // ~~spectrum~~ Do something with the audio data as it is ~~obtained using spectrum~~played.~~item~~ processSample(nsamples[i], channels, rate); }

}

</pre>

===== Complete Example: ~~Reading and Displaying FFT~~ Visualizing Audio Spectrum =====

This example ~~uses the native~~ calculates and displays FFT spectrum data ~~from <code>mozSpectrum</code> to display~~ for the ~~frequency spectrum in a canvas~~playing audio:

[[File:fft.png]]

<!DOCTYPE html>

<html>

<head>

<title>JavaScript Spectrum Example</title>

</head>

<body>

<audio src="song.ogg"

controls="true"

onloadedmetadata="loadedMetadata(event);"

onaudiowritten="audioWritten(event);"

style="width: 512px;">

</audio>

~~var spectrum;~~ var canvas = document.getElementById('fft');, ~~var~~ ctx = canvas.getContext('2d'), fft; function loadedMetadata(event) { var channels = event.mozChannels, rate = event.mozRate, frameBufferLength = event.mozFrameBufferLength; fft = new FFT(frameBufferLength / channels, rate), }

function audioWritten(event) {

~~spectrum~~ var fb = event.~~mozSpectrum~~mozFrameBuffer, signal = new Float32Array(fb.length / channels), magnitude; for (var ~~specSize~~ i = 0, fbl = ~~spectrum~~fb.length/ 2; i < fbl; i++ ) { // Assuming interlaced stereo channels, ~~magnitude~~ // need to split and merge into a stero-mix mono signal signal[i] = (fb[2*i] + fb[2*i+1]) / 2; } fft.forward(signal);

// Clear the canvas before drawing spectrum

ctx.clearRect(0,0, canvas.width, canvas.height);

for ( var i = 0; i < ~~specSize~~fft.spectrum.length; i++ ) { ~~magnitude = spectrum.item(i) * 4000;~~ // multiply spectrum by a zoom value magnitude = fft.spectrum[i] * 4000;

// Draw rectangle bars for each frequency bin

ctx.fillRect(i * 4, canvas.height, 3, -magnitude);

}

// FFT from dsp.js, see below

var FFT = function(bufferSize, sampleRate) {

this.bufferSize = bufferSize;

this.sampleRate = sampleRate;

this.spectrum = new Float32Array(bufferSize/2);

this.real = new Float32Array(bufferSize);

this.imag = new Float32Array(bufferSize);

this.reverseTable = new Uint32Array(bufferSize);

this.sinTable = new Float32Array(bufferSize);

this.cosTable = new Float32Array(bufferSize);

var limit = 1,

bit = bufferSize >> 1;

while ( limit < bufferSize ) {

for ( var i = 0; i < limit; i++ ) {

this.reverseTable[i + limit] = this.reverseTable[i] + bit;

}

limit = limit << 1;

bit = bit >> 1;

}

for ( var i = 0; i < bufferSize; i++ ) {

this.sinTable[i] = Math.sin(-Math.PI/i);

this.cosTable[i] = Math.cos(-Math.PI/i);

}

};

FFT.prototype.forward = function(buffer) {

var bufferSize = this.bufferSize,

cosTable = this.cosTable,

sinTable = this.sinTable,

reverseTable = this.reverseTable,

real = this.real,

imag = this.imag,

spectrum = this.spectrum;

if ( bufferSize !== buffer.length ) {

throw "Supplied buffer is not the same size as defined FFT. FFT Size: " +

bufferSize + " Buffer Size: " + buffer.length;

}

for ( var i = 0; i < bufferSize; i++ ) {

real[i] = buffer[reverseTable[i]];

imag[i] = 0;

}

var halfSize = 1,

phaseShiftStepReal,

phaseShiftStepImag,

currentPhaseShiftReal,

currentPhaseShiftImag,

off,

tr,

ti,

tmpReal,

i;

while ( halfSize < bufferSize ) {

phaseShiftStepReal = cosTable[halfSize];

phaseShiftStepImag = sinTable[halfSize];

currentPhaseShiftReal = 1.0;

currentPhaseShiftImag = 0.0;

for ( var fftStep = 0; fftStep < halfSize; fftStep++ ) {

i = fftStep;

while ( i < bufferSize ) {

off = i + halfSize;

tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]);

ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]);

real[off] = real[i] - tr;

imag[off] = imag[i] - ti;

real[i] += tr;

imag[i] += ti;

i += halfSize << 1;

}

tmpReal = currentPhaseShiftReal;

currentPhaseShiftReal = (tmpReal * phaseShiftStepReal) - (currentPhaseShiftImag * phaseShiftStepImag);

currentPhaseShiftImag = (tmpReal * phaseShiftStepImag) + (currentPhaseShiftImag * phaseShiftStepReal);

}

halfSize = halfSize << 1;

}

i = bufferSize/2;

while(i--) {

spectrum[i] = 2 * Math.sqrt(real[i] * real[i] + imag[i] * imag[i]) / bufferSize;

}

};

</script>

</body>

===== Writing Audio =====

It is also possible to setup an audio element for raw writing from script (i.e., without a ''src '' attribute). Content scripts can specify the audio stream's characteristics, then write audio frames using the following methods.:

<code>mozSetup(channels, sampleRate, volume)</code>

<pre>

// Create a new audio element

var audioOutput = new Audio();

// Set up audio element with 2 channel, 44.1KHz audio stream, volume set to full.

audioOutput.mozSetup(2, 44100, 1);

</pre>

<code>mozWriteAudio(~~length,~~ buffer)</code>

<pre>

// Write samples using a JS Array

var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];

~~var buffered =~~ audioOutput.mozWriteAudio(samples); // Write samples using a Typed Arrayvar samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.~~length~~242, ...]);audioOutput.mozWriteAudio(samples);

</pre>

~~'''Note:''' To copy the input samples of one audio stream directly to another audio output, you will need to convert event.samples array to a native JavaScript array like so:~~<code>mozCurrentSampleOffset()</code>

<pre>

// Get current position of the underlying audio stream, measured in samples written.var ~~output~~currentSampleOffset = audioOutput.mozCurrentSampleOffset();</pre>

~~function audioWritten~~Since the '''AudioWritten''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (~~event~~or process first and then pass){to a second:

~~samples~~ <pre><audio id= "a1" src="song.ogg" onloadedmetadata="loadedMetadata(event);" onaudiowritten="audioWritten(event);" controls="controls"></audio><script>var a1 = document.~~mozFrameBuffer;~~getElementById('a1'), a2 = new Audio(),

function loadedMetadata(event) { ~~outputSamples = [];~~// Mute a1 audio. ~~for(var i~~a1.volume =0;~~i < samples~~ // Setup a2 to be identical to a1, and play through there.~~length; i++){~~ ~~outputSamples[i] = samples~~ a2.~~item~~mozSetup(ievent.mozChannels, event.mozRate, 1); } ~~// outputSamples[] is now ready for writing to other audio element~~

function audioWritten(event) {

// Write the current frame to a2

a2.mozWriteAudio(event.mozFrameBuffer);

}

</script>

</pre>

Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, and a writing interval chosen that is equal to 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).

===== Complete Example: Creating a Web Based Tone Generator =====

<body>

~~<button onclick="generateWaveform()">set</button>~~

var ~~sampledata~~ sampleRate = ~~[];~~44100, ~~var freq~~ portionSize = ~~440;~~sampleRate / 10, ~~var interval~~ prebufferSize = ~~-1;~~sampleRate / 2, ~~var audio~~ freq = undefined;// no sound

~~function writeData() {~~ var n audio = ~~Math.ceil~~new Audio(~~freq / 100~~); ~~for(var i=0;i<n;i++)~~ audio.~~mozWriteAudio~~mozSetup(~~sampledata.length~~1, sampleRate, ~~sampledata~~1); }var currentWritePosition = 0;

function ~~start~~getSoundData(t, size) { ~~audio~~ var soundData = new ~~Audio~~Float32Array(size); ~~audio~~if (freq) { var k = 2* Math.~~mozSetup~~PI * freq / sampleRate; for (~~1, 44100, 1~~var i=0; i<size; i++);{ ~~interval~~ soundData[i] = ~~setInterval~~Math.sin(~~writeData, 10~~k * (i + t)); } } return soundData;

}

function ~~stop~~writeData() { if while(~~interval !~~audio.mozCurrentSampleOffset() + prebufferSize >= -1currentWritePosition) { ~~clearInterval~~var soundData = getSoundData(currentWritePosition, portionSize); audio.mozWriteAudio(~~interval~~soundData); ~~interval~~ currentWritePosition += -1portionSize;

}

// initial write writeData(); var writeInterval = Math.floor(1000 * portionSize / sampleRate); setInterval(writeData, writeInterval); function ~~generateWaveform~~start() {

freq = parseFloat(document.getElementById("freq").value);

~~// we're playing at 44.1kHz, so figure out how many samples~~

~~// will give us one full period~~

~~var samples = 44100 / freq;~~

~~sampledata = Array(Math.round(samples));~~

~~for (var i=0; i<sampledata.length; i++) {~~

~~sampledata[i] = Math.sin(2*Math.PI * (i / sampledata.length));~~

}

~~generateWaveform~~function stop(){ freq = undefined; }

</script>

</body>

== DOM Implementation ==

===== ~~nsIDOMAudioData~~ nsIDOMNotifyAudioMetadataEvent =====

Audio ~~data (raw and spectrum)~~ metadata is ~~currently returned in a pseudo-array named~~ provided via custom properties of the media element's ''~~nsIDOMAudioData~~'loadedmetadata''' event. ~~In future this will be changed to use~~ This event occurs once when the browser first aquires information about the ~~much faster native WebGL Array~~media resource. The event details are as follows: * '''Event''': LoadedMetadata* '''Event handler''': onloadedmetadata The '''LoadedMetadataEvent''' is defined as follows:

<pre>

interface ~~nsIDOMAudioData~~ nsIDOMNotifyAudioMetadataEvent : ~~nsISupports~~nsIDOMEvent

{

readonly attribute unsigned long ~~length~~mozChannels; ~~float item(in~~ readonly attribute unsigned long ~~index)~~mozRate; readonly attribute unsigned long mozFrameBufferLength;

};

</pre>

The '''~~length~~mozChannels''' attribute ~~indicates~~ contains a the number of ~~elements~~ channels in this audio resource (e.g., 2). The '''mozRate''' attribute contains the number of ~~data returned~~samples per second that will be played, for example 44100. The '''~~item()~~mozFrameBufferLength''' attribute contains the number of samples that will be returned in each '''AudioWritten''' ~~method provides~~ event. This number is a ~~getter~~ total for ~~audio sample data~~ all channels (e.g., ~~floats~~2 channels * 2048 samples = 4096 total).

===== nsIDOMNotifyAudioWrittenEvent =====

interface nsIDOMNotifyAudioWrittenEvent : nsIDOMEvent

{

// mozFrameBuffer is really a Float32Array, via dom_quickstubs readonly attribute ~~nsIDOMAudioData~~ nsIVariant mozFrameBuffer; readonly attribute ~~nsIDOMAudioData mozSpectrum~~unsigned long mozTime;

};

</pre>

The '''mozFrameBuffer''' attribute contains a typed array ('''Float32Array''') and the raw audio data (float values) obtained from decoding a single frame of audio. This is of the form <nowiki>[left, right, left, right, ...]</nowiki>. All audio frames are normalized to a length of '''4096 ~~or greater, where shorter frames are padded~~ '''. ''Note:'' this size may change in future versions of this API in order to more properly deal with ~~0 (zero)~~sample rate and channel variations.

The '''~~mozSpectrum~~mozTime''' attribute contains ~~a pre-calculated FFT for this frame of audio data. It is calculated using~~ an unsigned integer representing the ~~first 4096 float values~~ time in milliseconds since the ~~current audio frame only, which may include zeros used to pad the buffer. It is always 1024 elements in length~~start.

===== ~~nsIDOMHTMLMediaElement~~ nsIDOMHTMLAudioElement additions =====

Audio write access is achieved by adding two new methods to the HTML media element:

void mozSetup(in long channels, in long rate, in float volume);

void mozWriteAudio(~~in long count, [~~array~~, size_is~~); // array is Array(~~count~~)~~] in float valueArray~~or Float32Array() void mozCurrentSampleOffset();

</pre>

The '''mozSetup()''' method allows an <audio~~> or <video~~> element to be setup for writing from script. This method '''must''' be called before '''mozWriteAudio''' can be called, since an audio stream has to be created for the media element. It takes three arguments:

# '''channels''' - the number of audio channels (e.g., 2)

# '''volume''' - the initial volume to use (e.g., 1.0)

The choices made for '''channel''' and '''rate''' are significant, because they determine the frame size you must use when passing data to '''mozWriteAudio()'''. That is, you must pass either pass an array with 0 elements--similar to flushing the audio stream--or enough data for each channel specified in '''mozSetup()'''.

The '''~~mozWriteAudio~~mozSetup()''' method ~~can be~~ , if called ~~after '''mozSetup~~more than once, will recreate a new audio stream (destroying an existing one if present)~~'''~~with each call. ~~It allows a frame of audio (or multiple frames~~Thus it is safe to call this more than once, but ~~whole frames) to be written directly from script~~unnecessary. ~~It takes two arguments:~~

# The '''~~count~~mozWriteAudio()''' ~~- the number of elements in this frame (e.g., 4096)~~# method can be called after '''~~valueArray~~mozSetup()''' ~~- an array of floats, which represent a complete frame of~~ . It allows audio ~~(or multiple frames, but whole frames)~~data to be written directly from script. It takes one argument:

~~Both~~ # '''array''' - this is a JS Array (i.e., new Array()) or a typed float array (i.e., new Float32Array()) containing the audio data (floats) you wish to write. It must be 0 or N (where N % channels == 0) elements in length, otherwise a DOM error occurs. The '''mozCurrentSampleOffset()''' method can be called after '''mozSetup()'''. It returns the current position (measured in samples) of the audio stream. This is useful when determining how much data to write with '''mozWriteAudio()''' . All of '''mozWriteAudio()''', '''mozCurrentSampleOffset()''', and '''mozSetup()''' will throw exceptions if called out of order~~, or if audio frame sizes do not match~~.

== Additional Resources ==

* [http://code.bocoup.com/audio-data-api/builds/firefox-3.7a5pre.en-US.linux-i686-audio-data-api-11e.tar.bz2 Linux 32-bit - Build 11e]

* [http://code.bocoup.com/audio-data-api/builds/firefox-3.7a5pre.en-US.linux-i686-audio-data-api-12.tar.bz2 Linux 32-bit - Build 12] (Uses new WebGL Float Arrays. Examples need to be updated.)

A version of Firefox combining [https://bugzilla.mozilla.org/show_bug.cgi?id=508906 Multi-Touch screen input from Felipe Gomes] and audio data access from David Humphrey can be downloaded [http://gul.ly/5q here].

A number of working demos have been created, including:

'''NOTE:''' ''If you try to run demos created with the original API using a build that implements the new API, you may encounter [https://bugzilla.mozilla.org/show_bug.cgi?id=560212 bug 560212]. We are aware of this, as is Mozilla, and it is being investigated.''

==== Demos Working on Current API ====

* FFT visualization (calculated with js)

** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html

* Beat Detection (also showing use of WebGL for 3D visualizations)

** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor1HD-13a.html (video [http://vimeo.com/11345262 here])

** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor2HD-13a.html (video of older version [http://vimeo.com/11345685 here])

** http://cubicvr.org/CubicVR.js/bd3/BeatDetektor3HD-13a.html (video [http://www.youtube.com/watch?v=OxoFcyKYwr0&fmt=22 here])

* Writing Audio from JavaScript, Digital Signal Processing

** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/

==== Demos Needing to be Updated to New API ====

** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here])

** http://ondras.zarovi.cz/demos/audio/

* Beat Detection (also showing use of WebGL for 3D visualizations)

** http://cubicvr.org/CubicVR.js/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])

** http://cubicvr.org/CubicVR.js/BeatDetektor2HD.html (video [http://vimeo.com/11345685 here])

** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html

** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml

** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])

** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])

** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/

** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]

** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]

** Biquad filter http://www.ricardmarxer.com/audioapi/biquad/ (demo by Ricard Marxer)

** Interactive Audio Application, Bloom http://code.bocoup.com/bloop/color/bloop.html (video [http://vimeo.com/11346141 here] and [http://vimeo.com/11345133 here])

=== Third Party Discussions ===

A number of people have written about our work, including:

* http://ajaxian.com/archives/amazing-audio-sampling-in-javascript-with-firefox

* http://createdigitalmusic.com/2010/05/03/real-sound-synthesis-now-an-open-standard-in-the-browser/

* http://www.webmonkey.com/2010/05/new-html5-tools-make-your-browser-sing-and-dance/

* http://www.wired.co.uk/news/archive/2010-05/04/new-html5-tools-give-adobe-flash-the-finger

* http://hacks.mozilla.org/2010/04/beyond-html5-experiments-with-interactive-audio/

* http://schepers.cc/?p=212

David.humphrey

Confirm

656

edits

Changes

Audio Data API

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools