Media recoding architecture

MediaRecorder Arch.jpg


  • Vorbis => Theora
  • We can not use VideoFrameContainer/AudioSystem directly, use adapter instead


Reference spec


This API allow user to do the recording stuff. UA can instatiates a mediaStream object, call the start() and then call end stop() or wait mediaStream to be ended. The content of recording data would be encoded with the application select mine-type format and pass blob data via the dataavailable event. Application can also choose to receive smaller buffers of data at regular intervals.

Media Recorder API


 [Constructor (MediaStream stream)]
 interface MediaRecorder : EventTarget {
   readonly    attribute MediaStream        stream;
   readonly    attribute RecordingStateEnum state;
               attribute EventHandler       ondataavailable;
               attribute EventHandler       onerror;
               attribute EventHandler       onwarning;
   void              start (optional long timeslice);
   void              stop ();
   void              pause ();
   void              resume ();
   void              requestData ();

JavaScript Sample Code

 var mStream;
 var mr;
 function dataavailable(e) {
   //combine the blob data and write to disk
   var encodeData = new Blob([], {type: 'audio/ogg'});
 function start()
   mr = new MediaRecorder(mStream);
   mr.ondataavailable = dataavailable;
   try {
     mr.start(1000); //every 1000ms, trigger dataavailable event
   } catch (e) {
     //handle start fail exception
 function ex(e)
 //handle webRTC exception
 function streamcb(astream)
   mStream = astream;
 //Use WebRTC to get mediaStream
 navigator.mozGetUserMedia({audio:true}, streamcb, ex);

MediaEncoder Draft

API draft

 * Implement a queue-like interface between MediaStream and encoder
 * since encoded may take long time to process one segment while new 
 * segment comes.
class MediaSegmentAdapter {
   enum BufferType {

   /* AppendSegment may run on MediaStreamGraph thread to avoid race condition */
   void AppendSegment(MediaSegment) = 0;
   void SetBufferLength(size_t);
   size_t GetBufferLength();

   MediaSegment DequeueSegment();

   Queue<MediaSegment> mBuffer;

   friend class Encoder;
 * A base class for video type adapters, which provide basic implementation 
 * e.g, copy the frame
class VideoAdapter : MediaSegmentAdapter {
   /* This version deep copy/color convert the input buffer into a local buffer */
   void AppendSegment(MediaSegment);
 * In FirefoxOS, we have hardware encoder and camera output platform-specific 
 * buffer which may give better performance
class GrallocAdapter : MediaSegmentAdapter {
   /* This version |do not| copy frame data, but queue the GraphicBuffer directly
      which can be used with SurfaceMediaSource or other SurfaceFlinger compatible 
      mechanism for hardware supported video encoding */
   void AppendSegment(MediaSegment);
 * Similar to VideoAdapter, and since audio codecs may need |collect| enough data
 * then real encode it, we may implement raw buffer with some specific length and
 * collect data into the buffer 
class AudioAdapter : MediaSegmentAdapter {
   /* Copy/resample the data into local buffer */
   void AppendSegment(MediaSegment);
 * Add some dequeue like interface to MediaSegment and make it thread-safe 
 * to replace these adapters
 * MediaRecord keep a state-machine and make sure MediaEncoderListener is in 
 * a valid state for data manipulation
 * This take response to initialize all the component(e.g.container writers, codecs)
 * and link them together.
class MediaEncoderListener : MediaStreamListener {
   /* Callback functions for MediaStream */
   void NotifyConsumption(MediaStreamGraph, Consumption);
   void NotifyPull(MediaStreamGraph, StreamTime);
   void NotifyBlockingChanged(MediaStreamGraph, Blocking);
   void NotifyHasCurrentData(MediaStreamGraph, bool);
   void NotifyOutput(MediaStreamGraph);
   void NotifyFinished(MediaStreamGraph);
   /* Queue the MediaSegment into correspond adapters */
   /* XXX: Is it possible to determine Audio related paramenters from this callback? 
           Or we have to query them from MediaStream directly? */
   /* AppendSegment into Adapter need block MediaStreamGraph thread to avoid race condition
      and we should schedule one encoding loop if any track updated */
   void NotifyQueuedTrackChanges(MediaStreamGraph, TrackID, TrackRate, TrackTicks, uint32_t, MediaSegment);

   /* Callback functions to JS */
   void SetDataAvailableCallback()
   void SetErrorCallback()
   void SetMuteTrackCallback()  /*NOT IMPL*/
   void SetPauseCallback()
   void SetPhotoCallback()  /*NOT IMPL*/
   void SetRecordingCallback()
   void SetResumeCallback()
   void SetStopCallback()
   void SetUnmuteTrackCallback()  /*NOT IMPL*/
   void SetWarningCallback()

   enum EncoderState {
       NOT_STARTED,    // Encoder initialized, no data inside
       ENCODING,       // Encoder work on current data
       DATA_AVAILABLE, // Some encoded data available
       ENCODED,        // All input track stopped (EOS reached)

   /* Status/Data polling function */
   void GetEncodedData(unsigned char* buffer, int length); 
   EncoderState GetEncoderState();
   /* Option query functions */
   nsArray<String> GetSupportMIMEType()
   Pair<int, int> GetSupportWidth() 
   Pair<int, int> GetSupportHeight()

   /* Set requested encoder */
   void SetMIMEType();
   void SetVideoWidth();
   void SetVideoHeight();

   /* JS control functions */
   void MuteTrack(TrackID)  /*NOT IMPL*/
   void Pause()
   void Record()
   void RequestData  /*NOT IMPL*/
   void Resume()
   void SetOptions()
   void Stop()
   void TakePhoto  /*NOT IMPL*/
   void UnmuteTrack  /*NOT IMPL*/

   /* initial internal state and codecs */
   void Init()

   /* create MediaEncoder for given MediaStream */

   void QueueVideoSegments();
   void QueueAudioSegments();
   void SelectCodec();
   void ConfigCodec();
   void SetVideoQueueSize();
   void SetAudioQueueSize();

   /* data member */
   MediaSegment mVideoSegments;  // Used as a glue between MediaStreamGraph and MediaEncoder
   MediaSegment mAudioSegments;
   Encoder mVideoEncoder;
   Encoder mAudioEncoder;
   MediaEncoder mMediaEncoder;
   Thread mEncoderThread;
 * Different codecs usually support some codec specific parameters which
 * we may take advantage of.
 * Let each implementation provide its own parameter set, and use common
 * params if no special params requested.
union CodecParams {
   OpusParams opusParams;
   TheoraParams theoraParams;
   MPEG4Params mpeg4Params;
   // etc.
 * base class for general codecs:
 * we generally do not implement codec ourself, but we need a generic interface
 * to capsulate it.
 * For example, if we want to support opus, we should create a OpusCodec and let
 * it inherit this base class(by inherit AudioCodec), and implement OpusCodec by 
 * utilize libopus API.
class Encoder {
   enum EncodingState {
       COLLOCTING, /* indicate the encoder still wait enough data to be encoded */
       ENCODING,   /* there is enough data to be encoded, but incomplete */
       ENCODED,    /* indicate there is some output can be get from this codec */

   nsresult Init() = 0;

   /* Let Encoder setup buffer length based on codec characteristic
      e.g. Stagefright video codecs may only use 1 buffer since the buffer maybe shared between hardwares */

   /* Mimic android::CameraParameter to collect backend codec related params in general class */
   CodecParams GetParams() = 0;
   nsresult SetParams(CodecParams) = 0;
   /* Start the encoder, if the encoder got its own thread, create the thread here */
   nsresult Encode() = 0;

   /* Read the encoded data from encoder, check the status before attempt to read, otherwise error would returned */
   EncoderState GetCurrentState();
   nsresult GetEncodedData(MediaSegment& encData) = 0;

   /* codec specific header to describe self type/version/etc. */
   Metadata GetCodecHeader();

   /* force the encoder to output current available data */
   /* XXX: this maybe required to support MediaEncoder::Request, but may not supported by all encoder backend */
   void Flush() = 0;

   MediaSegmentAdapter mQueue;
class AudioTrackEncoder : public Encoder {
   /* AudioCodec may need collect enough buffers to be encode, return COLLECT as needed */
   EncoderStatus Encode(MediaBuffer in, void* out, size_t length) = 0;

   bool IsDataEnough();
   void* mLocalBuffer[MIN_FRAMES];
class VideoTrackEncoder : public Encoder {
   EncoderStatus Encode(MediaBuffer in, void* out, size_t length) = 0;
class OpusEncoder : public AudioTrackEncoder {
   // Use libopus to encode audio
   // libopus ctx, etc...
class TheoraEncoder : public VideoTrackEncoder {
   // Use libtheora to encode video
   // libtheora encoder is not blocking, thus we have to loop until frame complete
 * Generic base class for container writer
 * Similar to MediaCodec and we separate container and codec for future extension.
class MediaWriter {
   void AddTrack(Encoder);

   /* Block until container packet write done*/
   nsTArray<char> GetPacket(); 
class OggWriter {
   // Use libogg to write container
   // libogg context and others
   VideoTrackEncoder mVideoEncoder; // e.g. TheoraEncoder
   AudioTrackEncoder mAudioEncoder; // e.g. OpusEncoder/VorbisEncoder

Working flow

  1. MediaEncoder create MediaCodecs, MediaWriter based on request
    1. MediaEncoder create MediaSegmentAdapters from MediaCodecs
    2. MediaEncoder add MediaCodecs into MediaWriter tracks
  2. MediaEncoder register callback to MediaStream
  3. MediaStreamGraph thread callback to MediaEncoder when stream update
  4. MediaEncoder queue new MediaSegments into each MediaSegmentAdapters based on its type
    1. MediaSegmentAdapter copy/enqueue/color convert/etc... the data and queue them up
  5. MediaEncoder post a task to encoder thread
  6. Encoder thread ask MediaWriter for Packet
    1. If MediaWriter::GetPacket called for first time
      1. get Codec specific headers first, and produce Write header/metadata packet
    2. Otherwise
      1. MediaWriter ask Codecs for encoded data
      2. MediaWriter write packets
  7. MediaEncoder call onRecordingCallback with raw data or nsArray
  8. MediaRecord API post encoded data blob to Javascript layer

NOTE: step 8 are not specified in this API


  • Recording media stream audio data, can be played by audio tag
  • MediaRecorder state machine check


  • General codecs do not describe metadata
    • Codec type information have to be write done by some other mechenism
  • Some codecs collect enough data to produce output, MediaCodec::GetEncodedData is not adequate.
    • Writer should query MediaCodec state before attempt to read data
    • Some container force stream interleave, we may need some sync mechanism
  • Since encoder may pending, EOS event may need some extra handling
=> Messaging related detail should be determinated until real implementation


  • We will only implement Audio related part in current stage
  • Some interaction between MediaEncoder and MediaRecorder is indeterminated, the affected function will not implemented at this stage (marked with /*NOT IMPL*/)