Abstract

The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.

Introduction

To get a quick understand what is project FoxEye. Please see below file:
The latest one:

Presentation file in Whistler Work Week:
- Project FoxEye Status Update: FoxEye
- FoxEye Cross Fifrefox OS:Use case

Outdated

Presentation file in Portland Work Week.File:Project FoxEye Portland Work Week.pdf
Presentation file in P2PWeb WorkShop.File:Project FoxEye 2015-Feb.pdf
Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8

The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks.
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.

Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a VideoWorker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way.

Concept

MediaStreamTrack with VideoWorker:

The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding VideoWorker related API, we can let MedisStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.

[Constructor(DOMString scriptURL)]
interface VideoWorker : Worker {
    void terminate ();
    [Throws]
    void postMessage (any message, optional sequence<any> transfer);
                attribute EventHandler onmessage;
};

partial interface MediaStreamTrack {
    void             addWorkerMonitor (VideoWorker worker);
    void             removeWorkerMonitor (VideoWorker worker);
    MediaStreamTrack addWorkerProcessor (VideoWorker worker);
    void             removeWorkerProcessor ();
};

interface VideoWorkerGlobalScope : WorkerGlobalScope  {
    [Throws]
    void postMessage (any message, optional sequence<any> transfer);
                attribute EventHandler onmessage;
                attribute EventHandler onvideoprocess;
};

interface VideoProcessEvent : Event {
    readonly    attribute DOMString    trackId;
    readonly    attribute double       playbackTime;
    readonly    attribute ImageBitmap  inputImageBitmap;
    readonly    attribute ImageBitmap? outputImageBitmap;
};

Example Code

Please check the section examples in MediaStreamTrack with worker.

ImageBitmap extensions

Please see [3] for more information.

WebImage:

Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.

OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.

OfflineMediaContext:

We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.

// typedef unsigned long long DOMTimeStamp;
interface OfflineMediaContext {
    void start(DOMTimeStamp durationToStop);
    attribute EventHandler onComplete;
};
// Add an optional argument into the constructor.
[Constructor (optional OfflineMediaContext context),
 Constructor (MediaStream stream, optional OfflineMediaContext context),
 Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]
interface MediaStream : EventTarget {
// No modification.
...
}

OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.
OfflineMediaContext is also the object who can trigger the non-realtime processing.
OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)
The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.
The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.
If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.
If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)

OpenCV.js

OpenCV + Emscripten = OpenCV.js
https://github.com/CJKu/opencv

Demo pages

OpenCV.js

This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.
http://people.mozilla.org/~cku/opencv/

MST with Worker and ImageBitmap

You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.
Source code of the demo:
https://github.com/kakukogou/foxeye-demo
Demo website:
http://people.mozilla.org/~tkuo/foxeye-demo/

Monitor

Processor

Unlimited Potentials

Use Cases

Digital Image Processing(DIP) for camera:
- Face In, see Sony Face In
- Augmented Reality, see IKEA AR
- Camera Panorama,
- Fisheye camera,
- Comic Effect,
- Long term, might need Android Camera HAL 3 to control camera
  - Smile Snapshot
  - Gesture Snapshot
  - HDR
  - Video Stabilization
- Bar code scanner
Photo and video editing
- Video Editor, see WeVideo on Android
- A faster way for video editing tools.
- Lots of existing image effects can be used for photo and video editing.
- https://www.facebook.com/thanks
Object Recognition in Image(Not only FX OS, but also broswer):
- Shopping Assistant, see Amazon Firefly
- Face Detection/Tracking,
- Face Recognition,
- Text Recognition,
- Text Selection in Image,
  - See http://projectnaptha.com/
- Text Inpainting,
- Image Segmentation,
- Text translation on image, see Waygo
Duo Camera:
- Nature Interaction(Gesture, Body Motion Tracking)
- Interactive Foreground Extraction

and so on....

Some cool applications we can refer in real worlds

Word Lens:
- https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo
- https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8
Waygo
- http://www.waygoapp.com/
PhotoMath
- https://photomath.net/
Cartoon Camera
- https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera
Photo Studio
- http://photo-studio.en.uptodown.com/android
Magisto
- https://play.google.com/store/apps/details?id=com.magisto
Adobe PhotoShop Express
- http://www.photoshop.com/products/photoshopexpress
Amazon(firefly app)
- https://play.google.com/store/apps/details?id=com.amazon.mShop.android

Current Status

Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203
MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102
ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176
ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979
OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/
OfflineMediaContext: Not yet started.
WebImage:Not yet started.
Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490
CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176

Next Phase(2015 H2)

Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension.
Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.
Start to work on OfflineMediaContext.
Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.
Do some explanatory experiment on WebImage concept.

Beyond 2015

Proof of Concept for WebImage.
A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?
Gestural control API with depth camera? => WebNI(Web Nature Interaction)?

Conclusion

Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.

References

[1]:WebAudio Spec, http://www.w3.org/TR/webaudio/
[2]:"Media Capture Stream with Video Worker", http://chiahungtai.github.io/mediacapture-worker/
[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/

Acknowledgements

This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.

About Authors

CTai

My name is Chia-hung Tai. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).

Kaku

Tzuhuo Kuo is an engineer in Mozilla Taipel office.

CJ Ku

CJ Ku is responsible for OpenCV.js part.

User talk:Dead project

Contents

Abstract

Introduction

Concept