Changes

Jump to: navigation, search

User:David.humphrey/Audio Data API 2

15,388 bytes added, 03:01, 17 May 2010
Created page with '== Defining an Enhanced API for Audio (Draft Recommendation) == ===== Version ===== This is the second major version of this API--the previous version is available here. The p…'
== Defining an Enhanced API for Audio (Draft Recommendation) ==

===== Version =====

This is the second major version of this API--the previous version is available here. The primary improvements and changes are:

* Removal of mozSpectrum (i.e., native FFT calculation)
* Use of so called WebGL Arrays (i.e., fast, typed, native float arrays) for the event framebuffer as well as mozWriteAudio().
* Native array interfaces instead of using accessors and IDL array arguments.
* No zero padding of audio data occurs anymore. All frames are exactly 4096 elements in length.

Demos written for the previous version are '''not''' compatible, though can be made to be quite easily. See details below.

===== Abstract =====

The HTML5 specification introduces the <audio> and <video> media elements, and with them the opportunity to dramatically change the way we integrate media on the web. The current HTML5 media API provides ways to play and get limited information about audio and video, but gives no way to programatically access or create such media. We present a new extension to this API, which allows web developers to read and write raw audio data.

===== Authors =====

* David Humphrey ([http://twitter.com/humphd @humphd])
* Corban Brook ([http://twitter.com/corban @corban])
* Al MacDonald ([http://twitter.com/f1lt3r @F1LT3R])
* Yury Delendik ([http://twitter.com/notmasteryet @notmasteryet])

===== Other Contributors =====

* Thomas Saunders
* Ted Mielczarek
* Felipe Gomes
* Ricard Marxer ([http://twitter.com/ricardmp @ricardmp])

===== Status =====

'''This is a work in progress.''' This document reflects the current thinking of its authors, and is not an official specification. The original goal of this specification was to experiment with web audio data on the way to creating a more stable recommendation. The authors hoped that this work, and the ideas it generated, would eventually find their way into Mozilla and other HTML5 compatible browsers. Both of these goals are within reach now, with work ramping up in [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 this Mozilla bug], and the announcement of an official [http://www.w3.org/2005/Incubator/audio/ W3C Audio Incubator Group] chaired by one of the authors.

The continuing work on this specification and API can be tracked here, and in [[https://bugzilla.mozilla.org/show_bug.cgi?id=490705 the bug]. Comments, feedback, and collaboration are all welcome. You can reach the authors on irc in the [irc://irc.mozilla.org/audio #audio channel] on irc.mozilla.org.

== API Tutorial ==

We have developed a proof of concept, experimental build of Firefox (builds provided below) which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and implements the following basic API for reading and writing raw audio data:

===== Reading Audio =====

Audio data is made available via an event-based API. As the audio is played, and therefore decoded, each frame is passed to content scripts for processing after being written to the audio layer--hence the name, '''AudioWritten'''. Playing and pausing the audio all affect the streaming of this raw audio data as well.

Consumers of this raw audio data register a callback on the <audio> or <video> element like so:

<pre>
<audio src="song.ogg" onaudiowritten="audioWritten(event);"></audio>
</pre>

The AudioWritten event provides two pieces of data. The first is a framebuffer (i.e., an array) containing sample data for the current frame. The second is the time (e.g., milliseconds) for the start of this frame.

<pre>
var samples;

function audioWritten(event) {
var samples = event.mozFrameBuffer;
var time = event.mozTime;

for (var i=0, slen=samples.length; i<slen; i++) {
processSample(samples[i]);
}
}
</pre>

===== Complete Example: Visualizing Audio Spectrum =====

This example uses the native FFT data from <code>mozSpectrum</code> to display the frequency spectrum in a canvas:

[[File:fft.png]]

<pre>
<!DOCTYPE html>
<html>
<head>
<title>JavaScript Spectrum Example</title>
</head>
<body>
<audio src="song.ogg"
controls="true"
onaudiowritten="audioWritten(event);"
style="width: 512px;">
</audio>

<div><canvas id="fft" width="512" height="200"></canvas></div>

<script>
var spectrum;

var canvas = document.getElementById('fft');
var ctx = canvas.getContext('2d');

function audioWritten(event) {
spectrum = event.mozSpectrum;

var specSize = spectrum.length, magnitude;

// Clear the canvas before drawing spectrum
ctx.clearRect(0,0, canvas.width, canvas.height);

for ( var i = 0; i < specSize; i++ ) {
magnitude = spectrum.item(i) * 4000; // multiply spectrum by a zoom value

// Draw rectangle bars for each frequency bin
ctx.fillRect(i * 4, canvas.height, 3, -magnitude);
}
}
</script>
</body>
</html>
</pre>

===== Writing Audio =====

It is also possible to setup an audio element for raw writing from script (i.e., without a ''src'' attribute). Content scripts can specify the audio stream's characteristics, then write audio frames using the following methods:

<code>mozSetup(channels, sampleRate, volume)</code>

<pre>
var audioOutput = new Audio();
audioOutput.mozSetup(2, 44100, 1);
</pre>

<code>mozWriteAudio(buffer)</code>

<pre>
// Using a JS Array
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
audioOutput.mozWriteAudio(samples);

// Using a Typed Array
var samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.242, ...]);
audioOutput.mozWriteAudio(samples);
</pre>

Since both the AudioWritten event and the mozWriteAudio() method both use Float32Array, it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:

<pre>
// Create a new audio stream for writing
var aout = new Audio();
aout.mozSetup(2, 44100, 1);

function audioWritten(event){
samples = event.mozFrameBuffer;

// Do any filtering, signal processing, etc.
for(var i=0;i < samples.length; i++){
process(samples[i]);
}

aout.mozWriteAudio(samples);
}
</pre>

===== Complete Example: Creating a Web Based Tone Generator =====

This example creates a simple tone generator, and plays the resulting tone.

<pre>
<!DOCTYPE html>
<html>
<head>
<title>JavaScript Audio Write Example</title>
</head>
<body>
<input type="text" size="4" id="freq" value="440"><label for="hz">Hz</label>
<button onclick="generateWaveform()">set</button>
<button onclick="start()">play</button>
<button onclick="stop()">stop</button>

<script type="text/javascript">
var sampledata = [];
var freq = 440;
var interval = -1;
var audio;

function writeData() {
var n = Math.ceil(freq / 100);
for(var i=0;i<n;i++)
audio.mozWriteAudio(sampledata);
}

function start() {
audio = new Audio();
audio.mozSetup(1, 44100, 1);
interval = setInterval(writeData, 10);
}

function stop() {
if (interval != -1) {
clearInterval(interval);
interval = -1;
}
}

function generateWaveform() {
freq = parseFloat(document.getElementById("freq").value);
// we're playing at 44.1kHz, so figure out how many samples
// will give us one full period
var samples = 44100 / freq;
sampledata = Array(Math.round(samples));
for (var i=0; i<sampledata.length; i++) {
sampledata[i] = Math.sin(2*Math.PI * (i / sampledata.length));
}
}

generateWaveform();
</script>
</body>
</html>
</pre>

== DOM Implementation ==

===== nsIDOMNotifyAudioWrittenEvent =====

Audio data is made available via the following event:

* '''Event''': AudioWrittenEvent
* '''Event handler''': onaudiowritten

The '''AudioWrittenEvent''' is defined as follows:

<pre>
interface nsIDOMNotifyAudioWrittenEvent : nsIDOMEvent
{
readonly attribute nsIVariant mozFrameBuffer;
readonly attribute unsigned long mozTime;
};
</pre>

The '''mozFrameBuffer''' attribute contains a typed array (Float32Array) and the raw audio data (float values) obtained from decoding a single frame of audio. This is of the form <nowiki>[left, right, left, right, ...]</nowiki>. All audio frames are normalized to a length of 4096.

The '''mozTime''' attribute contains an unsigned integer representing the time in milliseconds since the start.

===== nsIDOMHTMLMediaElement additions =====

Audio write access is achieved by adding two new methods to the HTML media element:

<pre>
void mozSetup(in long channels, in long rate, in float volume);

void mozWriteAudio(array);

void mozAvailable....
</pre>

The '''mozSetup()''' method allows an &lt;audio&gt; element to be setup for writing from script. This method '''must''' be called before '''mozWriteAudio''' can be called, since an audio stream has to be created for the media element. It takes three arguments:

# '''channels''' - the number of audio channels (e.g., 2)
# '''rate''' - the audio's sample rate (e.g., 44100 samples per second)
# '''volume''' - the initial volume to use (e.g., 1.0)

The choices made for '''channel''' and '''rate''' are significant, because they determine the frame size you must use when passing data to '''mozWriteAudio()'''. That is, you must pass either pass an array with 0 elements--similar to flushing the audio stream--or enough data for each channel specified in '''mozSetup()'''.

The '''mozWriteAudio()''' method can be called after '''mozSetup()'''. It allows audio data to be written directly from script. It takes one argument:

# '''array''' - this is a JS Array (e.g., new Array()) or a typed array (e.g., new Float32Array()) containing the audio data (floats) you wish to write. It must be 0 or N (where N % channels == 0) elements in length, otherwise a DOM error occurs.

Both '''mozWriteAudio()''' and '''mozSetup()''' will throw exceptions if called out of order, or if audio frame sizes do not match.

== Additional Resources ==

A series of blog posts document the evolution and implementation of this API: http://vocamus.net/dave/?cat=25. Another overview by Al MacDonald is available [http://weblog.bocoup.com/web-audio-all-aboard here].

=== Obtaining Code and Builds ===

A patch is available in the [https://bugzilla.mozilla.org/show_bug.cgi?id=490705 bug], if you would like to experiment with this API. We have also created builds you can download and run locally:

'''NOTE: the API and implementation are changing rapidly. We aren't able to post builds as quickly as we'd like, but will put them here as changes mature.

* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.mac.dmg Mac OS X 10.6]
* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.mac.dmg-10.5-tgz Mac OS X 10.5]
* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.win32.zip Windows 32-bit]
* [http://scotland.proximity.on.ca/dxr/tmp/firefox-3.7a1pre.en-US.linux-i686.tar.bz2 Linux 32-bit]
* [http://code.bocoup.com/audio-data-api/builds/firefox-3.7a5pre.en-US.linux-i686-audio-data-api-11e.tar.bz2 Linux 32-bit - Build 11e]
* [http://code.bocoup.com/audio-data-api/builds/firefox-3.7a5pre.en-US.linux-i686-audio-data-api-12.tar.bz2 Linux 32-bit - Build 12] (Uses new WebGL Float Arrays. Examples need to be updated.)

A version of Firefox combining [https://bugzilla.mozilla.org/show_bug.cgi?id=508906 Multi-Touch screen input from Felipe Gomes] and audio data access from David Humphrey can be downloaded [http://gul.ly/5q here].

=== JavaScript Audio Libraries ===

We have started work on a JavaScript library to make building audio web apps easier. Details are [[Audio Data API JS Library|here]].

=== Working Audio Data Demos ===

A number of working demos have been created, including:

* FFT visualization (calculated with js)
** http://weare.buildingsky.net/processing/dsp.js/examples/fft.html
** http://weare.buildingsky.net/processing/dft.js/audio.new.html (video [http://vimeo.com/8525101 here])

* FFT visualization (calculated with C++ - mozSpectrum)
** http://bocoup.com/core/code/firefox-fft/audio-f1lt3r.html (video [http://vimeo.com/8872704 here])
** http://www.storiesinflight.com/jsfft/visualizer/index.html (Demo by Thomas Sturm)
** http://blog.nihilogic.dk/2010/04/html5-audio-visualizations.html (Demo and API by Jacob Seidelin -- video [http://vimeo.com/11355121 here])
** http://ondras.zarovi.cz/demos/audio/

* Beat Detection (also showing use of WebGL for 3D visualizations)
** http://cubicvr.org/CubicVR.js/BeatDetektor1HD.html (video [http://vimeo.com/11345262 here])
** http://cubicvr.org/CubicVR.js/BeatDetektor2HD.html (video [http://vimeo.com/11345685 here])
** http://weare.buildingsky.net/processing/beat_detektor/beat_detektor.html
** http://code.bocoup.com/processing-js/3d-fft/viz.xhtml

* Visualizing sound using the video element
** http://bocoup.com/core/code/firefox-audio/whale-fft2/whale-fft.html (video [http://vimeo.com/8872808 here])

* Writing Audio from JavaScript, Digital Signal Processing
** Simple Tone Generator http://mavra.perilith.com/~luser/test3.html
** Playing Scales http://bocoup.com/core/code/firefox-audio/html-sings/audio-out-music-gen-f1lt3r.html (video [http://www.youtube.com/watch?v=HLkOgy1yO14&feature=player_embedded here])
** Square Wave Generation http://weare.buildingsky.net/processing/dsp.js/examples/squarewave.html
** Random Noise Generation http://weare.buildingsky.net/processing/dsp.js/examples/nowave.html
** JS Multi-Oscillator Synthesizer http://weare.buildingsky.net/processing/dsp.js/examples/synthesizer.html (video [http://vimeo.com/11411533 here])
** JS IIR Filter http://weare.buildingsky.net/processing/dsp.js/examples/filter.html (video [http://vimeo.com/11335434 here])
** Csound shaker instrument ported to JavaScript via Processing.js http://scotland.proximity.on.ca/dxr/tmp/audio/shaker/
** API Example: [http://code.bocoup.com/audio-data-api/examples/inverted-waveform-cancellation Inverted Waveform Cancellation]
** API Example: [http://code.bocoup.com/audio-data-api/examples/stereo-splitting-and-panning Stereo Splitting and Panning]
** API Example: [http://code.bocoup.com/audio-data-api/examples/mid-side-microphone-decoder/ Mid-Side Microphone Decoder]
** API Example: [http://code.bocoup.com/audio-data-api/examples/ambient-extraction-mixer/ Ambient Extraction Mixer]
** Biquad filter http://www.ricardmarxer.com/audioapi/biquad/ (demo by Ricard Marxer)
** Interactive Audio Application, Bloom http://code.bocoup.com/bloop/color/bloop.html (video [http://vimeo.com/11346141 here] and [http://vimeo.com/11345133 here])

=== Third Party Discussions ===

A number of people have written about our work, including:

* http://ajaxian.com/archives/amazing-audio-sampling-in-javascript-with-firefox
* http://createdigitalmusic.com/2010/05/03/real-sound-synthesis-now-an-open-standard-in-the-browser/
* http://www.webmonkey.com/2010/05/new-html5-tools-make-your-browser-sing-and-dance/
* http://www.wired.co.uk/news/archive/2010-05/04/new-html5-tools-give-adobe-flash-the-finger
* http://hacks.mozilla.org/2010/04/beyond-html5-experiments-with-interactive-audio/
Confirm
656
edits

Navigation menu