User talk:Dead project
Abstract
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.
Introduction
To get a quick understand what is project FoxEye. Please see below file:
The latest one:
- Presentation file in Whistler Work Week:
Outdated
- Presentation file in Portland Work Week.File:Project FoxEye Portland Work Week.pdf
- Presentation file in P2PWeb WorkShop.File:Project FoxEye 2015-Feb.pdf
- Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks.
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a VideoWorker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way.
Concept
MediaStreamTrack with VideoWorker:
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding VideoWorker related API, we can let MedisStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.
[Constructor(DOMString scriptURL)]
interface VideoWorker : Worker {
void terminate ();
[Throws]
void postMessage (any message, optional sequence<any> transfer);
attribute EventHandler onmessage;
};
partial interface MediaStreamTrack {
void addWorkerMonitor (VideoWorker worker);
void removeWorkerMonitor (VideoWorker worker);
MediaStreamTrack addWorkerProcessor (VideoWorker worker);
void removeWorkerProcessor ();
};
interface VideoWorkerGlobalScope : WorkerGlobalScope {
[Throws]
void postMessage (any message, optional sequence<any> transfer);
attribute EventHandler onmessage;
attribute EventHandler onvideoprocess;
};
interface VideoProcessEvent : Event {
readonly attribute DOMString trackId;
readonly attribute double playbackTime;
readonly attribute ImageBitmap inputImageBitmap;
readonly attribute ImageBitmap? outputImageBitmap;
};
Example Code
Please check the section examples in MediaStreamTrack with worker.
ImageBitmap extensions
Please see [3] for more information.
WebImage:
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.
OfflineMediaContext:
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.
// typedef unsigned long long DOMTimeStamp;
interface OfflineMediaContext {
void start(DOMTimeStamp durationToStop);
attribute EventHandler onComplete;
};
// Add an optional argument into the constructor.
[Constructor (optional OfflineMediaContext context),
Constructor (MediaStream stream, optional OfflineMediaContext context),
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]
interface MediaStream : EventTarget {
// No modification.
...
}
- OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.
- OfflineMediaContext is also the object who can trigger the non-realtime processing.
- OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)
- The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.
- The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.
- If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.
- If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)
OpenCV.js
- OpenCV + Emscripten = OpenCV.js
- https://github.com/CJKu/opencv
Demo pages
OpenCV.js
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.
http://people.mozilla.org/~cku/opencv/
MST with Worker and ImageBitmap
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.
Source code of the demo:
https://github.com/kakukogou/foxeye-demo
Demo website:
http://people.mozilla.org/~tkuo/foxeye-demo/
Monitor
Processor
Unlimited Potentials
Use Cases
- Digital Image Processing(DIP) for camera:
- Face In, see Sony Face In
- Augmented Reality, see IKEA AR
- Camera Panorama,
- Fisheye camera,
- Comic Effect,
- Long term, might need Android Camera HAL 3 to control camera
- Smile Snapshot
- Gesture Snapshot
- HDR
- Video Stabilization
- Bar code scanner
- Photo and video editing
- Video Editor, see WeVideo on Android
- A faster way for video editing tools.
- Lots of existing image effects can be used for photo and video editing.
- https://www.facebook.com/thanks
- Object Recognition in Image(Not only FX OS, but also broswer):
- Shopping Assistant, see Amazon Firefly
- Face Detection/Tracking,
- Face Recognition,
- Text Recognition,
- Text Selection in Image,
- Text Inpainting,
- Image Segmentation,
- Text translation on image, see Waygo
- Duo Camera:
- Nature Interaction(Gesture, Body Motion Tracking)
- Interactive Foreground Extraction
and so on....
Some cool applications we can refer in real worlds
- Word Lens:
- Waygo
- PhotoMath
- Cartoon Camera
- Photo Studio
- Magisto
- Adobe PhotoShop Express
- Amazon(firefly app)
Current Status
- Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203
- MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102
- ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176
- ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979
- OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/
- OfflineMediaContext: Not yet started.
- WebImage:Not yet started.
- Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490
- CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176
Next Phase(2015 H2)
- Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension.
- Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.
- Start to work on OfflineMediaContext.
- Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.
- Do some explanatory experiment on WebImage concept.
Beyond 2015
- Proof of Concept for WebImage.
- A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?
- Gestural control API with depth camera? => WebNI(Web Nature Interaction)?
Conclusion
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.
References
- [1]:WebAudio Spec, http://www.w3.org/TR/webaudio/
- [2]:"Media Capture Stream with Video Worker", http://chiahungtai.github.io/mediacapture-worker/
- [3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/
Acknowledgements
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.
About Authors
CTai
My name is Chia-hung Tai. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).
Kaku
Tzuhuo Kuo is an engineer in Mozilla Taipel office.
CJ Ku
CJ Ku is responsible for OpenCV.js part.