Audio Data API Review Version: Difference between revisions

Jump to navigation Jump to search
Line 28: Line 28:
== API Tutorial ==
== API Tutorial ==


We have developed a proof of concept, experimental build of Firefox ([[#Obtaining_Code_and_Builds|builds provided below]]) which extends the HTMLMediaElement (e.g., affecting <video> and <audio>) and HTMLAudioElement, and implements the following basic API for reading and writing raw audio data:
This API extends the HTMLMediaElement and HTMLAudioElement (e.g., affecting <video> and <audio>), and implements the following basic API for reading and writing raw audio data:


===== Reading Audio =====
===== Reading Audio =====


Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, '''AudioAvailable'''.  These samples may or may not have been played yet at the time of the event.  The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element.  Playing and pausing the audio also affect the streaming of this raw audio data.
Audio data is made available via an event-based API.  As the audio is played, and therefore decoded, sample data is passed to content scripts in a framebuffer for processing after becoming available to the audio layer--hence the name, '''AudioAvailable'''.  These samples may or may not have been played yet at the time of the event.  The audio samples returned in the event are raw, and have not been adjusted for mute/volume settings on the media element.  Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.


Users of this API can register two callbacks on the <audio> or <video> element in order to consume this data:
Users of this API can register two callbacks on the <audio> or <video> element in order to consume this data:
Line 39: Line 39:
<audio src="song.ogg"
<audio src="song.ogg"
       onloadedmetadata="audioInfo();"
       onloadedmetadata="audioInfo();"
      onmozaudioavailable="audioAvailable(event);">
</audio>
</audio>
</pre>
</pre>
Line 49: Line 48:
* mozFrameBufferLength
* mozFrameBufferLength


Prior to the '''LoadedMetadata''' event, these attributes will return 0 (zero), indicating that they are not known, or there is no audio.  These attributes indicate the '''number of channels''', audio '''sample rate per second''', and the '''default size of the framebuffer''' that will be used in '''AudioAvailable''' events.  This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.
Prior to the '''LoadedMetadata''' event, accessing these attributes will cause an exception to be thrown, indicating that they are not known, or there is no audio.  These attributes indicate the '''number of channels''', audio '''sample rate per second''', and the '''default size of the framebuffer''' that will be used in '''MozAudioAvailable''' events.  This event is fired once as the media resource is first loaded, and is useful for interpreting or writing the audio data.


The '''AudioAvailable''' event provides two pieces of data.  The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).  The second is the time for these samples measured from the start in seconds.
The '''MozAudioAvailable''' event provides two pieces of data.  The first is a framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).  The second is the time for these samples measured from the start in seconds. Web developers consume this event by registering an event listener in script like so:
 
<pre>
&ltaudio id="audio" src="song.ogg"&gt;&lt;/audio&gt;
&lt;script&gt;
  var audio = document.getElementById("audio");
  audio.addEventListener('MozAudioAvailable', someFunction, false);
&lt;/script&gt;
</pre>
 
An audio or video element can also be created with script outside the DOM:
 
<pre>
var audio = new Audio();
audio.src = "song.ogg";
audio.addEventListener('MozAudioAvailable', someFunction, false);
audio.play();
</pre>


The following is an example of how both events might be used:
The following is an example of how both events might be used:
Line 71: Line 87:


function audioAvailable(event) {
function audioAvailable(event) {
   var samples = event.mozFrameBuffer;
   var samples = event.frameBuffer;
   var time    = event.mozTime;
   var time    = event.time;


   for (var i = 0; i < frameBufferLength; i++) {
   for (var i = 0; i < frameBufferLength; i++) {
Line 277: Line 293:
</pre>
</pre>


Since the '''AudioAvailable''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:
Since the '''MozAudioAvailable''' event and the '''mozWriteAudio()''' method both use '''Float32Array''', it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second:


<pre>
<pre>
Line 283: Line 299:
       src="song.ogg"  
       src="song.ogg"  
       onloadedmetadata="loadedMetadata();"
       onloadedmetadata="loadedMetadata();"
      onmozaudioavailable="audioAvailable(event);"
       controls>
       controls="controls">
</audio>
</audio>
<script>
<script>
Line 303: Line 318:
   writeAudio(frameBuffer);
   writeAudio(frameBuffer);
}
}
a1.addEventListener('a1', audioAvailable, false);


function writeAudio(audio) {
function writeAudio(audio) {
Line 321: Line 337:


Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples.  For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).
Audio data written using the '''mozWriteAudio()''' method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (current sample offset of hardware can be obtained with '''mozCurrentSampleOffset()'''), where a little means something on the order of 500ms of samples.  For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).
===== Other Approachs to Writing Audio =====
1) Connect Decode Thread and Worker
'''Idea:'''
Pass audio data from the decoding thread to a worker for processing.
Create a worker thread, and pass it to the HTMLMediaElement.  The decode thread gets the worker via the HTMLMediaElement, and passes messages to the worker as audio is decoded, bypassing the main thread.  The worker processes the audio, and returns, allowing audio to be modified before it is played.
'''TODO:'''
* Modify HTMLMediaElement so you can pass it a Worker as the destination for audio data.
* Allow the decoder to get a thread-safe reference to this worker, so it can be used by the decoder thread.  The worker would be handed off to the audio decoding thread as an nsIWorker interface pointer
* Have the decode thread pass a message to the worker (ideally synchronously) so that the worker can return modified audio data, which then gets played as normal.  Perhaps pass some kind of event object to the worker.  When the worker is done modifying the data, it can call a method on the event object to say "hey, put this data back in the audio decoder thread's buffers now and let the audio decoder thread proceed."
'''Issues:'''
* The worker code  was written with the assumption that the code is either running on the main thread or on the worker thread.  Teaching it about a third thread would be necessary.
* May need to add a C++ PostMessage call to the nsIWorker interface, as the current one assumes it's being called through XPConnect.
* How do we get the data back from the worker to the decode thread?
* Keep a reference to the worker in the HTMLMediaElement use cycle collection to handle lifetimes
'''Notes:'''
* DecodeAudioData function in the webm backend is what you need to decode vorbis
* nsOggReader::DecodeAudioData pushes audio decoded data onto the audio queue.  Instead of this, push to worker.  Maybe shift that out of the nsOggReader (the adding to the audio queue) and into the nsBuiltin code.
2) Introduce a Callback to Decode Thread
'''Idea:'''
Allow js to push data to the Audio Queue, whether modified data from the decoder, or generated data.  Have a callback that gets the data as soon as the decode thread is done with it, and the callback modifies it (synch) before it goes into the queue to be played.  This callback could then postMessage to a worker, which processes it on another thread, and then puts it back into the audio queue (audio thread).  The audio thread no longer queues things directly.
'''Notes:'''
* This could replace the current mozWriteAudio() method.


===== Complete Example: Creating a Web Based Tone Generator =====
===== Complete Example: Creating a Web Based Tone Generator =====
Confirmed users
656

edits

Navigation menu