https://wiki.mozilla.org/api.php?action=feedcontributions&user=Ctai&feedformat=atomMozillaWiki - User contributions [en]2024-03-28T16:42:56ZUser contributionsMediaWiki 1.27.4https://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1157010User talk:Dead project2016-12-14T06:54:11Z<p>Ctai: </p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*Orlando FoxEye session:[https://docs.google.com/presentation/d/14tI15Bvphew764XpOToAeUc5cgXT_kRTbD061ZCL5cY/edit?usp=sharing Orlando FoxEye]<br />
<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1157009User talk:Dead project2016-12-14T06:53:40Z<p>Ctai: /* FoxEye project */ new section</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*Orlando FoxEye session:[https://docs.google.com/presentation/d/14tI15Bvphew764XpOToAeUc5cgXT_kRTbD061ZCL5cY/edit?usp=sharing Orlando FoxEye]<br />
<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.<br />
<br />
== FoxEye project ==<br />
<br />
FoxEye project</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1157008User talk:Dead project2016-12-14T06:52:57Z<p>Ctai: </p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*Orlando FoxEye session:[https://docs.google.com/presentation/d/14tI15Bvphew764XpOToAeUc5cgXT_kRTbD061ZCL5cY/edit?usp=sharing Orlando FoxEye]<br />
<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=Project_FoxEye&diff=1155350Project FoxEye2016-11-21T13:15:47Z<p>Ctai: Ctai moved page Project FoxEye to User talk:Dead project: Not available now</p>
<hr />
<div>#REDIRECT [[User talk:Dead project]]</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1155349User talk:Dead project2016-11-21T13:15:38Z<p>Ctai: Ctai moved page Project FoxEye to User talk:Dead project: Not available now</p>
<hr />
<div>N/A</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1155348User talk:Dead project2016-11-21T13:14:08Z<p>Ctai: Replaced content with "N/A"</p>
<hr />
<div>N/A</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2016-01-27&diff=1114420TPEMedia/2016-01-272016-01-27T02:57:20Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===John Lin===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===JW Wang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Benjamin Chen===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**r?<br />
**In review process.<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Alastor Wu===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
*{{Bug|1235749}} - Add "tags = audiochannel" for all audio-channel related mochitests<br />
** land<br />
<br />
*{{Bug|1238906}} - Implement audible data checking in the MDSM's audio queue<br />
** land<br />
<br />
*{{Bug|1238472}} - [emulator-x86-kk][mochitest] failures on test_browserElement_inproc_NoAudioTrack.html<br />
** land<br />
<br />
*{{Bug|1240429}} - Implement silent-data checking in MediaElement when the input is the media stream<br />
** r? (in discussion)<br />
<br />
*{{Bug|1242874}} - Enable pause/play control ability for the AudioChannel API <br />
** r?<br />
<br />
===Blake Wu===<br />
*'''DRM discussion for DRM integration on FxOS'''<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Kaku Kuo===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Munro Chiang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2016-01-20",<br />
"changed_before": "2016-01-27",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla></div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-12-30&diff=1110885TPEMedia/2015-12-302015-12-30T03:10:08Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===John Lin===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===JW Wang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Benjamin Chen===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
**Change MediaPipeline(WebRTC) case to MediaStreamVideoSink.<br />
**Craft the WIP changesets to review code quality. <br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Alastor Wu===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
*{{Bug|1223261}} - [Testcase] Audio should be playback correctly when open the pref "dom.audiochannel.mutedByDefault"<br />
** land<br />
<br />
*{{Bug|1228564}} - Check the audio capturing when we register the new AudioChannelAgent<br />
** land<br />
<br />
*{{Bug|1234735}} - Remove redundant spaces in nsGlobalWindow<br />
** land<br />
<br />
*{{Bug|1227051}} - [Testcase] No audio track video shouldn't register the AudioChannelAgent<br />
** r+<br />
<br />
*{{Bug|1204793}} - [Testcase] Unregister AudioChannelAgent when its volume changes to ZERO or be muted<br />
** r?<br />
<br />
*{{Bug|1225425}} - [Testcase] Do not unregister the AudioChannelAgent during seeking<br />
** r?<br />
<br />
*{{Bug|1235535}} - [Testcase] Split the audio channel muted-by-default test from "test_browserElement_inproc/oop_AudioChannel.html."<br />
** WIP<br />
<br />
*{{Bug|1223297}} - [Testcase] App should have the ability to playback different kinds of audio channel <br />
** backout<br />
<br />
===Blake Wu===<br />
*'''DRM discussion for DRM integration on FxOS'''<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Kaku Kuo===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Munro Chiang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED", "REOPENED"],<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-23",<br />
"changed_before": "2015-12-30",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla></div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-12-23&diff=1110440TPEMedia/2015-12-232015-12-23T01:56:00Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===John Lin===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===JW Wang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Benjamin Chen===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
**Change MediaRecorder and ImageCapture case to MediaStreamVideoSink.<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Alastor Wu===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Blake Wu===<br />
*'''DRM discussion for DRM integration on FxOS'''<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Kaku Kuo===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Munro Chiang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-12-09",<br />
"changed_before": "2015-12-16",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla></div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1109626User talk:Dead project2015-12-17T08:14:11Z<p>Ctai: /* Introduction */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*2015-Dec Orlando FoxEye session:[https://docs.google.com/presentation/d/14tI15Bvphew764XpOToAeUc5cgXT_kRTbD061ZCL5cY/edit?usp=sharing Orlando FoxEye]<br />
*2015-June FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*2015-June Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1109625User talk:Dead project2015-12-17T08:12:55Z<p>Ctai: /* Introduction */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*Orlando FoxEye session:[https://docs.google.com/presentation/d/14tI15Bvphew764XpOToAeUc5cgXT_kRTbD061ZCL5cY/edit?usp=sharing Orlando FoxEye]<br />
<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108923User talk:Dead project2015-12-10T16:25:51Z<p>Ctai: /* Next Phase(2015 H2) */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108922User talk:Dead project2015-12-10T16:25:16Z<p>Ctai: /* ImageBitmap extensions */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [2] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108921User talk:Dead project2015-12-10T16:24:52Z<p>Ctai: /* References */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/<br />
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108911User talk:Dead project2015-12-10T14:45:14Z<p>Ctai: /* Concept */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108910User talk:Dead project2015-12-10T14:44:08Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108909User talk:Dead project2015-12-10T10:37:31Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
readonly attribute promise<ImageBitmap> outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108878User talk:Dead project2015-12-09T23:20:56Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
readonly attribute ImageBitmap? outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108877User talk:Dead project2015-12-09T23:20:10Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
readonly attribute ImageBitmap? outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[File:Worker - FLOW.png]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=File:Worker_-_FLOW.png&diff=1108876File:Worker - FLOW.png2015-12-09T23:19:37Z<p>Ctai: </p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108875User talk:Dead project2015-12-09T23:19:17Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
readonly attribute ImageBitmap? outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
Main thread:<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
<br />
Worker Thread<br><br />
[[]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=File:NewProjectFoxEye1.png&diff=1108874File:NewProjectFoxEye1.png2015-12-09T23:18:06Z<p>Ctai: Ctai uploaded a new version of &quot;File:NewProjectFoxEye1.png&quot;</p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=File:NewProjectFoxEye1.png&diff=1108873File:NewProjectFoxEye1.png2015-12-09T23:16:26Z<p>Ctai: Ctai uploaded a new version of &quot;File:NewProjectFoxEye1.png&quot;</p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=File:NewProjectFoxEye1.png&diff=1108872File:NewProjectFoxEye1.png2015-12-09T23:16:13Z<p>Ctai: Ctai uploaded a new version of &quot;File:NewProjectFoxEye1.png&quot;</p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108871User talk:Dead project2015-12-09T23:03:11Z<p>Ctai: /* Introduction */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]'''<br />
<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
readonly attribute ImageBitmap? outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=File:NewProjectFoxEye1.png&diff=1108870File:NewProjectFoxEye1.png2015-12-09T22:59:57Z<p>Ctai: Ctai uploaded a new version of &quot;File:NewProjectFoxEye1.png&quot;</p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108869User talk:Dead project2015-12-09T22:56:50Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)]<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap? inputImageBitmap;<br />
};<br />
<br />
[Exposed=(Window, Worker),<br />
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)]<br />
interface VideoProcessEvent : VideoMonitorEvent {<br />
readonly attribute ImageBitmap? outputImageBitmap;<br />
};<br />
<br />
</source><br />
<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1108868User talk:Dead project2015-12-09T22:56:11Z<p>Ctai: /* MediaStreamTrack with Worker: */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
interface VidoeMonitor : EventTarget {<br />
attribute EventHandler onvideomonitor;<br />
};<br />
<br />
interface VideoProcessor : EventTarget {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
partial interface MediaStreamTrack {<br />
void addVideoMonitor(VidoeMonitor monitor);<br />
void removeVideoMonitor(VidoeMonitor monitor);<br />
MediaStreamTrack addVideoProcessor(VidoeProcessor processor);<br />
void removeVideoProcessor();<br />
};<br />
<br />
</source><br />
<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-11-25&diff=1107140TPEMedia/2015-11-252015-11-25T03:46:58Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["ayang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===John Lin===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["jolin@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===JW Wang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["jwwang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Benjamin Chen===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["bechen@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
**Trace how MSG deal with Pause/Play in video file case.<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["ctai@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Alastor Wu===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["alwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
*{{Bug|1224475}} - Unlimited Buffering when seeking or re-playing video on b2g devices.<br />
** r?<br />
<br />
*{{Bug|1223261}} - [B2G] Add the test to simulate the "muted by default" audio playback<br />
** WIP, cause another test-case timeout <br />
<br />
*{{Bug|1223297}} - [B2G] Add the test to make sure that multiple channels can be playback under the same window<br />
** WIP, cause another test-case timeout<br />
<br />
*{{Bug|1204793}} - [Testcase] Unregister AudioChannelAgent when its volume changes to ZERO or be muted<br />
** WIP<br />
<br />
*{{Bug|1214148}} - AudioChannel API design doesn't fit into nested mozbrowser iframe case.<br />
** discuss with baku<br />
<br />
===Blake Wu===<br />
*'''DRM discussion for DRM integration on FxOS'''<br />
<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["bwu@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
<br />
===Kaku Kuo===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["tkuo@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
===Munro Chiang===<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-18",<br />
"changed_before": "2015-11-25",<br />
"assigned_to": ["mchiang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla></div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-11-18&diff=1106148TPEMedia/2015-11-182015-11-18T02:57:31Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-11",<br />
"changed_before": "2015-11-18",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-11",<br />
"changed_before": "2015-11-18",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
*mediacapture-worker spec discussion<br />
*File Bug 1223696<br />
<br />
===Alastor Wu===<br />
*{{Bug|1218593}} - Dialer touch tones sounds intermittently stop playing audio in Dialer.<br />
** land<br />
<br />
*{{Bug|1222902}} - Create log system for the AudioChannel<br />
** land<br />
<br />
*{{Bug|1223261}} - [B2G] Add the test to simulate the "muted by default" audio playback<br />
** r+ <br />
<br />
*{{Bug|1223297}} - [B2G] Add the test to make sure that multiple channels can be playback under the same window<br />
** WIP<br />
<br />
*{{Bug|1207546}} - Integrate WebRTC with audio channels<br />
** debug<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-11-11&diff=1105141TPEMedia/2015-11-112015-11-11T03:33:01Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-04",<br />
"changed_before": "2015-11-11",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-04",<br />
"changed_before": "2015-11-11",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
*compensation leave for TPAC, 0.5 day.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1206581}} - Implement notifyChannel() on AudioChannel API <br />
** land<br />
<br />
*{{Bug|1220320}} - [Accessibility] Screen Reader prompt no longer plays, automatically in the FTU or after toggling volume up and down several times.<br />
** land<br />
<br />
*{{Bug|1222564}} - Volume level does not persist after reboot<br />
** review code<br />
<br />
*{{Bug|1218593}} - Dialer touch tones sounds intermittently stop playing audio in Dialer.<br />
** r+<br />
<br />
*{{Bug|1222902}} - Create log system for the AudioChannel<br />
** r+<br />
<br />
*{{Bug|1223261}} - [B2G] Add the test to simulate the "muted by default" audio playback<br />
** r+ <br />
<br />
*{{Bug|1223297}} - [B2G] Add the test to make sure that multiple channels can be playback under the same window<br />
** WIP<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-11-11&diff=1105140TPEMedia/2015-11-112015-11-11T03:22:29Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-11-04",<br />
"changed_before": "2015-11-11",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-11-04",<br />
"changed_before": "2015-11-11",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
<br />
===Alastor Wu===<br />
*{{Bug|1206581}} - Implement notifyChannel() on AudioChannel API <br />
** land<br />
<br />
*{{Bug|1220320}} - [Accessibility] Screen Reader prompt no longer plays, automatically in the FTU or after toggling volume up and down several times.<br />
** land<br />
<br />
*{{Bug|1222564}} - Volume level does not persist after reboot<br />
** review code<br />
<br />
*{{Bug|1218593}} - Dialer touch tones sounds intermittently stop playing audio in Dialer.<br />
** r+<br />
<br />
*{{Bug|1222902}} - Create log system for the AudioChannel<br />
** r+<br />
<br />
*{{Bug|1223261}} - [B2G] Add the test to simulate the "muted by default" audio playback<br />
** r+ <br />
<br />
*{{Bug|1223297}} - [B2G] Add the test to make sure that multiple channels can be playback under the same window<br />
** WIP<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-11-04&diff=1104065TPEMedia/2015-11-042015-11-04T02:48:09Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-10-28",<br />
"changed_before": "2015-11-24",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-10-28",<br />
"changed_before": "2015-11-4",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**WIP<br />
*TPAC<br />
**3 sessions: breakout session, Ad-hoc meeting, WebRTC meeting.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1206581}} - Implement notifyChannel() on AudioChannel API <br />
** r+<br />
<br />
*{{Bug|1214148}} - AudioChannel API design doesn't fit into nested mozbrowser iframe case. <br />
** review code & debug<br />
<br />
*{{Bug|1213666}} - No sound in videos from France 24<br />
** debug, duplicate to bug 1214148.<br />
<br />
*{{Bug|1218593}} - Dialer touch tones sounds intermittently stop playing audio in Dialer.<br />
** debug<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=Standards&diff=1103359Standards2015-10-30T02:12:21Z<p>Ctai: /* WebRTC Working Group */</p>
<hr />
<div>Welcome to Mozilla's standards participation page.<br />
<br />
Many at Mozilla participate in the development of open web standards, in a variety of different standards bodies. This is a directory of standards organizations (and sub-orgs like working groups) listing who at Mozilla is working with each. For a technology summary see the [[standards/technologies|technologies]] page.<br />
<br />
To encourage better web standards coordination and cross-pollination, the sections below are organized alphabetically by standards body, then alphabetically by working group (if any), then the list of Mozilla folks participating in that working group, optionally listing which particular specifications (or sections thereof) that they edit/author/contribute to.<br />
<br />
If you actively directly communicate/participate with a standards body (working group email list, IRC, wiki, and/or f2f meetings), please add yourself (and the specific standards body / working group if any).<br />
<br />
If you work in multiple working groups or with multiple standards organizations, list yourself in each, linking to your wiki User page.<br />
<br />
Thanks!<br />
<br />
— [[User:Tantek|Tantek]]<br />
<br />
= Web Standards Coordination =<br />
== general participation ==<br />
If you'd like to participate in some of these groups, or at least watch, learn, get up to speed, you can almost always do so by lurking on the public IRC channels and mailing lists that the groups use. Many (most?) standards mailing lists can often be overwhelming in quantity, depth so start with IRC as that's often lighter-weight and easier to watch for quick bits of info/knowledge.<br />
<br />
* Follow the instructions on the [[IRC|IRC wiki page]] to:<br />
** Set yourself up with a nickname and connection to <code>irc.mozilla.org</code>. <br />
* Add a connection to <code>irc.freenode.net</code> (also with '''[x] SSL''') where many standards discussions take place.<br />
* Add another connection to <code>irc.w3.org</code> but specifically port 6665 (unprotected, no nickname registration).<br />
* See each standards section below for which IRC channel(s) tend(s) to be used by folks working in each group.<br />
<br />
== Orgless specs ==<br />
* [[APNG_Specification]]<br />
** fork: [https://gist.github.com/SoniEx2/c679e771d506210378a5 MPNGPNG - Mutli-PNG PNG spec]<br />
<br />
== Ecma International ==<br />
* <span class="h-card">[[User:Brendan|Brendan Eich]]</span><br />
* dherman<br />
* <span class="h-card"><span class="p-name">Allen Wirfs-Brock</span> (<span class="p-role role">Project Editor</span>)</span><br />
* Andreas Gal<br />
<br />
Specifications: ECMAScript 5, 5.1, 6, Harmony, etc.<br />
<br />
== IETF ==<br />
http://ietf.org/<br />
* ... no lead currently ...<br />
* ISOC Advisory Council Members:<br />
** Adam Roach (:abr)<br />
** Tim Terriberry (:derf)<br />
<br />
<br />
=== Calsify (iCalendar) ===<br />
* Most calendar related standards, list at: http://www.ietf.org/mail-archive/web/calsify/current/maillist.html<br />
<br />
* Philipp Kewisch<br />
<br />
Specifications [http://tools.ietf.org/html/rfc5545 rfc5545] [http://tools.ietf.org/html/rfc5546 rfc5546] [http://www.ietf.org/id/draft-kewisch-et-al-icalendar-in-json-01.txt icalendar-in-json] and related.<br />
<br />
=== dnsop ===<br />
* lshapiro (Larissa Shapiro)<br />
<br />
=== dhcp ===<br />
* lshapiro (Larissa Shapiro)<br />
<br />
=== HyBi (WebSockets) ===<br />
* <span class="h-card">Pat McManus</span><br />
* <span class="h-card">Chris Blizzard</span> (emeritus as of 2012-03-16)<br />
<br />
=== NETVC (Internet Video Codec) ===<br />
Proposed charter at http://trac.tools.ietf.org/bof/trac/wiki#NETVC<br />
<br />
Mailing list at https://www.ietf.org/mailman/listinfo/video-codec<br />
<br />
* Adam Roach (:abr) - Chairing BoF<br />
* Jack Moffitt (:jack)<br />
* Tim Terriberry (:derf)<br />
* Jean-Marc Valin (:jmspeex)<br />
* Nathan Egge<br />
<br />
=== Opus ===<br />
* <span class="h-card"><span class="p-name">Jean-Marc Valin</span> (:<span class="p-nickname">jmspeex</span>)</span><br />
* <span class="h-card"><span class="p-name">Tim Terriberry</span> (:<span class="p-nickname">derf</span>)</span><br />
* <span class="h-card"><span class="p-name">Ralph Giles</span> (:<span class="p-nickname nickname">rillian</span>)</span><br />
<br />
=== RTCWEB / MMUSIC ===<br />
* <span class="h-card">Randell Jesup</span><br />
* <span class="h-card">Tim Terriberry</span><br />
* <span class="h-card">Ralph Giles</span><br />
* <span class="h-card">Adam Roach (:abr)</span><br />
* <span class="h-card">Eric Rescorla (<span class="p-nickname">EKR</span>)</span><br />
* <span class="h-card">Maire Reavy </span><br />
<br />
=== STIR ===<br />
* Eric Rescorla<br />
<br />
=== TLS (SSL) ===<br />
* <span class="h-card">[[User:Briansmith|Brian Smith]]</span><br />
* Eric Rescorla<br />
<br />
=== VCARDDAV ===<br />
vcarddav group/list at: [http://www.ietf.org/mail-archive/web/vcarddav/current/maillist.html http://www.ietf.org/mail-archive/web/vcarddav/current/maillist.html]<br />
* <span class="vcard">[[User:Tantek|Tantek Çelik]]</span><br />
* Philipp Kewisch<br />
Specifications: [[vCard4]] [http://www.ietf.org/id/draft-kewisch-vcard-in-json-00.txt vcard-in-json]<br />
<br />
== Khronos ==<br />
[http://www.khronos.org/webgl/ WebGL]<br />
* Jeff Gilbert (:jgilbert)<br />
<br />
== microformats ==<br />
http://microformats.org/ and [http://microformats.org/wiki microformats wiki]<br />
* irc://irc.freenode.net/microformats<br />
* email lists: http://microformats.org/discuss<br />
Community participants:<br />
* <span class="h-card"><span class="p-name">[[User:Tantek|Tantek Çelik]]</span> (<span class="p-role">founder</span>, <span class="p-role">admin</span>)</span><br />
* <span class="h-card">Michael Kaply</span><br />
* ...<br />
<br />
Specifications: <br />
* [[hCard]] - implemented in Firefox DOM<br />
* [[hCalendar]] - implemented in Firefox DOM<br />
* ... and many others.<br />
<br />
== OWF ==<br />
http://openwebfoundation.org/<br />
* <span class="h-card"><span class="p-name">[[User:Tantek|Tantek Çelik]]</span> (<span class="role">elected board member</span>)</span><br />
<br />
Specifications: <br />
* [http://openwebfoundation.org/legal/agreement/ Open Web Foundation Agreement] (OWFa)<br />
<br />
== W3C ==<br />
The [http://w3.org/ W3C] (World Wide Web Consortium) has Working Groups (WGs), Incubator Groups (IGs), Interest Groups (IGs), and Community Groups (WGs). See below for details and please add any/all of such groups here in alphabetical order by group name.<br />
* [[Standards/Participating in a W3C Working Group|Participating in a W3C Working Group]]<br />
* [[Standards/W3C Charter Development and Review|W3C Charter Development and Review]]<br />
* [https://www.w3.org/2000/09/dbwg/participants?org=35507&order=group Member-confidential (unfortunately) list of groups Mozilla participates in]<br />
<br />
=== Advisory Board ===<br />
Elected member to the [http://www.w3.org/wiki/AB W3C Advisory Board].<br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
<br />
=== Advisory Committee representative ===<br />
* <span class="fn h-card">[[User:Dbaron|David Baron]]</span><br />
See [https://www.w3.org/Member/ACList Advisory Committee Representative Directory] for who else is an AC Rep from which companies.<br />
<br />
=== Audio Incubator Group ===<br />
http://www.w3.org/2005/Incubator/audio/<br />
* <span class="h-card">Alistair MacDonald</span><br />
<br />
=== Audio Working Group ===<br />
* <span class="h-card">Matthew Gregan</span><br />
<br />
=== Browser Testing and Tools Working Group ===<br />
* <span class="h-card">David Burns</span><br />
* Clint Talbert (IRC: ctalbert)<br />
<br />
Specifications:<br />
* APIs (application programming interfaces) for use in automated testing of Web applications<br />
* APIs for use in troubleshooting and debugging of Web applications<br />
<br />
=== Core Mobile Web Platform Community Group ===<br />
http://www.w3.org/community/coremob/<br />
* <span class="h-card">[[User:Brendan|Brendan Eich]]</span><br />
* <span class="h-card">Jonas Sicking</span><br />
* <span class="h-card">Ragavan Srinivasan</span><br />
* <span class="h-card">Jet Villegas</span><br />
<br />
=== CSS (Cascading Style Sheets) Working Group ===<br />
http://w3.org/Style/CSS/<br />
* irc://irc.w3.org:6665/css<br />
* email: http://lists.w3.org/Archives/Public/www-style/<br />
Working group members related to Mozilla (also on w3c-css-wg)<br />
* <span class="h-card">[[User:Dbaron|David Baron]]</span><br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
* <span class="h-card">John Daggett</span><br />
* <span class="h-card">[[User:Fantasai|fantasai]]</span><br />
* <span class="h-card">Aryeh Gregor</span><br />
* <span class="h-card">Masayuki Nakano</span><br />
* <span class="h-card">[[User:Jetvillegas|Jet Villegas]]</span><br />
<br />
Additional www-style list participants related to Mozilla (anyone is welcome to join)<br />
* <span class="h-card">Robert O'Callahan</span><br />
* <span class="h-card">Henri Sivonen</span><br />
* <span class="h-card">Boris Zbarsky</span><br />
* <span class="h-card">Daniel Holbert</span><br />
* ...<br />
<br />
Specifications: [[CSS21]], [[CSS3]]<br />
<br />
See also: [[CSS]] on this wiki.<br />
<br />
=== Federated Social Web Community Group ===<br />
* http://www.w3.org/community/fedsocweb/<br />
Participants:<br />
* <span class="h-card">David Ascher</span><br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
<br />
was previously: Federated Social Web Incubator Group<br />
<br />
=== Games Community Group ===<br />
http://www.w3.org/community/games/<br />
* <span class="h-card">Rob Hawkes</span><br />
* <span class="h-card">Alan Kligman</span><br />
* <span class="h-card">Dan Mosedale</span><br />
* <span class="h-card">Bobby Richter</span><br />
<br />
=== Geolocation Working Group ===<br />
Geolocation Working Group (GEO) http://www.w3.org/2008/geolocation/<br />
* <span class="h-card">Doug Turner</span><br />
<br />
=== HTML Working Group ===<br />
HTML (HyperText Markup Language) Working Group (WG), sometimes listed as "HTML5 WG"<br />
http://www.w3.org/html/wg/<br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
* <span class="h-card">[[User:Mounir.lamouri|Mounir Lamouri]]</span><br />
* <span class="h-card">Jonas Sicking</span><br />
* <span class="h-card">Henri Sivonen</span><br />
* <span class="h-card">[[User:Jetvillegas|Jet Villegas]]</span><br />
* ...<br />
<br />
Specifications: [[HTML5]]<br />
<br />
=== HTML Speech Incubator Group ===<br />
* <span class="h-card">David Bolter</span><br />
* <span class="h-card">Olli Pettay</span><br />
<br />
=== Indie UI Events ===<br />
http://www.w3.org/2011/11/indie-ui-charter<br />
* <span class="h-card">David Bolter</span> (monitoring)<br />
<br />
=== Internationalization Working Group ===<br />
http://w3.org/International/<br />
* <span class="h-card">[[User:Fantasai|fantasai]]</span><br />
<br />
=== Media Fragments Working Group ===<br />
* <span class="h-card">Chris Double</span><br />
<br />
=== Near Field Communications Working Group ===<br />
W3C [http://www.w3.org/2012/nfc/ Near Field Communications (NFC) Working Group]<br />
* No one from Mozilla is currently participating.<br />
<br />
Want to participate? Please contact <span class="h-card">[[User:Dbaron|David Baron]]</span> and <span class="h-card">[[User:Tantek|Tantek]]</span>.<br />
<br />
=== Pointer Events Working Group ===<br />
* http://www.w3.org/2012/pointerevents/<br />
Participants:<br />
* <span class="h-card">Olli Pettay</span><br />
* <span class="h-card">Matt Brubeck</span><br />
<br />
=== Protocols and Formats Working Group ===<br />
(Web Accessibility) Protocols and Formats Working Group (PF WG)<br />
* <span class="h-card">David Bolter</span><br />
<br />
=== PubSubHubbub Community Group ===<br />
* http://www.w3.org/community/pubsub/<br />
Participants:<br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
<br />
=== Social Web Working Group ===<br />
SocialWG - http://www.w3.org/Social/WG<br />
* <span class="h-card">[[User:Tantek|Tantek]]</span> - co-chair<br />
<br />
=== SVG Working Group ===<br />
SVG (Scalable Vector Graphics) Working Group<br />
http://w3.org/SVG/<br />
* <span class="h-card">Cameron McCormack</span> (co-chair)<br />
* <span class="h-card">Brian Birtles</span><br />
* <span class="h-card">Jonathan Watt</span><br />
<br />
Specifications: SVG 1.1, SVG 2.0<br />
<br />
=== System Applications Working Group ===<br />
[http://www.w3.org/2012/sysapps/ SysApps] (System Applications) Working Group [https://www.w3.org/2000/09/dbwg/details?group=58119&public=1&order=org#_MozillaFoundation participants]:<br />
* <span class="h-card">[[User:Brendan|Brendan Eich]]</span><br />
* <span class="h-card">[[User:Sicking|Jonas Sicking]]</span><br />
<br />
=== Tracking Protection Working Group ===<br />
http://www.w3.org/2011/tracking-protection/<br />
* <span class="h-card">Alex Fowler</span><br />
* <span class="h-card">Thomas Lowenthal</span><br />
* <span class="h-card">Sid Stamm</span><br />
<br />
=== Technical Architecture Group ===<br />
W3C [http://www.w3.org/2001/tag/ TAG]<br />
* <span class="h-card">[[User:Dbaron|David Baron]]</span><br />
<br />
=== Web Applications Working Group ===<br />
WebApps WG<br />
* <span class="h-card">Cameron McCormack</span><br />
* <span class="h-card">[[User:Anant|Anant Narayanan]]</span><br />
* <span class="h-card">Olli Pettay</span><br />
* <span class="h-card">Arun Ranganathan</span><br />
* <span class="h-card">[[User:Sicking|Jonas Sicking]]</span><br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span> (observer)<br />
* <span class="h-card">[[User:Mounir.lamouri|Mounir Lamouri]]</span><br />
<br />
Specifications: IndexedDB, Web IDL, XMLHttpRequest, DOM 3 Events, DOM 4, etc. See the group's [http://www.w3.org/2008/webapps/wiki/PubStatus PubStatus] wiki for a list of all specs.<br />
<br />
See also on this wiki:<br />
* [[DOM]]<br />
* [[WebAPI]]<br />
<br />
<br />
=== Web Applications Security Working Group ===<br />
<br />
* Eric Rescorla<br />
* Daniel Veditz<br />
* Francois Marier<br />
* Tanvi Vyas<br />
* Freddy Braun<br />
<br />
=== Web Cryptography Working Group ===<br />
[http://www.w3.org/2012/webcrypto/ Web Cryptography Working Group]<br />
* <span class="h-card">[[User:Ddahl|David Dahl]]</span><br />
* <span class="h-card">Arun Ranganathan</span><br />
* <span class="h-card"><span class="p-name">Eric Rescorla</span> (<span class="p-nickname">EKR</span>)</span><br />
<br />
=== Web Education Community Group ===<br />
http://www.w3.org/community/webed/<br />
<br />
* <span class="h-card">Schalk Neethling</span><br />
* <span class="h-card">Jérémie Patonnier</span><br />
* <span class="h-card">Janet Swisher</span><br />
<br />
=== Web Events Working Group / Touch Events Community Group ===<br />
* <span class="h-card">Matt Brubeck</span><br />
* <span class="h-card">Olli Pettay</span><br />
<br />
Specifications: Touch Events<br />
<br />
=== WebFonts Working Group ===<br />
* <span class="h-card">Jonathan Kew</span> (editor)<br />
* <span class="h-card">John Daggett</span><br />
<br />
=== Web Hypertext Application Technology Community Group ===<br />
* <span class="h-card">[[User:Dbaron|David Baron]]</span><br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
* <span class="h-card">Aryeh Gregor</span><br />
* <span class="h-card">Cameron McCormack</span><br />
See also the [http://www.w3.org/community/whatwg/participants complete list of participants].<br />
<br />
Specifications: HTML living standard as developed by the WHATWG.<br />
<br />
=== Web Media Text Tracks Community Group ===<br />
http://www.w3.org/community/texttracks/<br />
* <span class="h-card"><span class="p-name">Ralph Giles</span> (:<span class="p-nickname">rillian</span>)</span><br />
<br />
Specifications: something [http://www.whatwg.org/specs/web-apps/current-work/webvtt.html WebVTT]-ish, we hope.<br />
<br />
=== Web Payments Task Force ===<br />
<br />
[http://www.w3.org/wiki/Payments_Task_Force http://www.w3.org/wiki/Payments_Task_Force]<br />
<br />
* <span class="h-card">Kumar McMillan</span><br />
* <span class="h-card">Andreas Gal</span><br />
* [http://www.w3.org/wiki/2013_Web_Payment_Task_Force_Participants Full list]<br />
<br />
=== Web Performance Working Group ===<br />
* <span class="h-card">Cameron McCormack</span><br />
* <span class="h-card">Kyle Simpson</span><br />
<br />
Specifications: Timing control for script-based animations (requestAnimationFrame)<br />
<br />
=== WebRTC Working Group ===<br />
[[WebRTC]] (Web Real Time Communications) Working Group<br />
* <span class="h-card">Ralph Giles</span><br />
* <span class="h-card">Maire Reavy</span><br />
* <span class="h-card"><span class="p-name">Eric Rescorla</span> (<span class="p-nickname">EKR</span>)</span><br />
* <span class="h-card">Tim Terriberry</span><br />
* <span class="h-card">Adam Roach (:abr)</span><br />
* <span class="h-card">Randell Jesup (:jesup)</span><br />
<br />
Specifications: Media capture & [http://www.w3.org/2011/04/webrtc-charter.html streaming APIs]<br />
<br />
* <span class="h-card">Chia-hung Tai(:ctai)</span><br />
* <span class="h-card">Tzu-hao Kuo(:kaku)</span><br />
<br />
Specifications: Media Capture Stream with Worker Extensions [https://w3c.github.io/mediacapture-worker/ mediacapture-worker APIs]<br />
<br />
=== Web Security Working Group (forming) ===<br />
* <span class="h-card">Brandon Sterne</span><br />
* <span class="h-card">Dan Veditz</span><br />
<br />
Specifications: CSP, CORS (jointly with WebApps WG)<br />
<br />
== WHATWG ==<br />
Web Hypertext Application Technologies Working Group - http://whatwg.org<br />
* <span class="h-card">[[User:Dbaron|David Baron]]</span><br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
* <span class="h-card">[[User:Brendan|Brendan Eich]]</span><br />
* <span class="h-card">Mounir Lamouri</span><br />
* <span class="h-card">Jonas Sicking</span><br />
* <span class="h-card">Henri Sivonen</span><br />
* <span class="h-card">[[User:Annevk|Anne van Kesteren]]</span><br />
* ...<br />
<br />
Web Editing specification - http://dvcs.w3.org/hg/editing/raw-file/tip/editing.html<br />
* <span class="h-card">[[User:Ehsan|Ehsan Akhgari]]</span><br />
<br />
= other =<br />
<br />
== CA/Browser Forum ==<br />
The [http://cabforum.org/ CA/Browser Forum] produces standards in the area of best practice and validation for certificate authorities.<br />
* <span class="h-card">[[User:Gerv|Gervase Markham]]</span><br />
* <span class="h-card">Sid Stamm</span><br />
* <span class="h-card">Kathleen Wilson</span><br />
<br />
== CalConnect ==<br />
Mozilla is a member of [http://www.calconnect.org/ CalConnect], The Calendaring and Scheduling Consortium, which is not actually affiliated w/ IETF or W3C but in practice drives development and interoperability testing of IETF specs:<br />
* RFC 5545 iCalendar (obsoletes RFC 2445).<br />
* RFC 4791 CalDAV Access protocol<br />
See their [http://www.calconnect.org/CD1104_Calendaring_Standards.shtml Index to Calendaring and Scheduling Standards] for other specific standards that CalConnect is involved with.<br />
<br />
== OASIS ==<br />
<br />
* Mozilla point of contact: Gervase Markham<br />
* PKCS#11 working group: Brian Smith<br />
<br />
== XMPP ==<br />
Mozilla is not formally associated with the XSF but has representation indirectly. http://xmpp.org/<br />
* no direct involvement by any current Mozillian<br />
<br />
== C++ ==<br />
<br />
C++ is standardized by [http://www.open-std.org/jtc1/sc22/wg21/ ISO/IEC JTC1/SC22/WG21] (informally, the "C++ Standards Committee"). All proposals are publically available [http://www.open-std.org/jtc1/sc22/wg21/docs/papers/ here].<br />
<br />
[https://mozillians.org/en-US/u/bballo/ Botond Ballo] is a member of Canada's delegation to the Committee, and has been attending meetings regularly since September 2013. If you have any feedback about any existing proposal, or would like to explore the idea of putting forth a new proposal, please post to dev-platform and cc Botond.<br />
<br />
= emeritus =<br />
== people ==<br />
Former Mozillians who worked on standards or still work on them:<br />
* <span class="h-card">Chris Blizzard</span> (til 2012-03-16)<br />
** [[#IETF]]<br />
** [[#rtcweb]]<br />
** [[#WebRTC_Working_Group]]<br />
<br />
* <span class="h-card"><span class="p-name">[[User:bear|Mike Taylor]]</span> (<span class="p-nickname">bear</span>) - <span class="p-role">elected board member</span></span><br />
** [[#XMPP]]<br />
<br />
== organizations and groups ==<br />
=== Federated Social Web Incubator Group ===<br />
2010-12-15 ... 2012-01-12 (transitioned to Federated Social Web Community Group)<br />
<br />
W3C Federated Social Web Incubator Group (FSW XG)<br />
http://www.w3.org/2005/Incubator/federatedsocialweb/ and [http://www.w3.org/2005/Incubator/federatedsocialweb/wiki/Main_Page FSW wiki]<br />
* <span class="h-card">David Ascher</span><br />
* <span class="h-card">[[User:Mixedpuppy|Shane Caraveo]]</span><br />
* <span class="h-card">[[User:Tantek|Tantek Çelik]]</span><br />
<br />
= subpages of {{FULLPAGENAME}}=<br />
{{Special:PrefixIndex/{{FULLPAGENAME}}/}}<br />
<br />
= related =<br />
See also:<br />
* [[Events]] - which include web standards-related events.<br />
* [[SEO/Standards]] - how to use standards to improve/optimize search results<br />
* [[Standards/license]] - what license Mozilla prefers for standards specifications</div>Ctaihttps://wiki.mozilla.org/index.php?title=File:FoxEye_-_Overview.png&diff=1102422File:FoxEye - Overview.png2015-10-26T03:44:35Z<p>Ctai: Ctai uploaded a new version of &quot;File:FoxEye - Overview.png&quot;</p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=File:FoxEye_-_Overview.png&diff=1102421File:FoxEye - Overview.png2015-10-26T03:42:48Z<p>Ctai: Ctai uploaded a new version of &quot;File:FoxEye - Overview.png&quot;</p>
<hr />
<div></div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-10-14&diff=1100494TPEMedia/2015-10-142015-10-14T02:53:10Z<p>Ctai: /* Status */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-10-07",<br />
"changed_before": "2015-10-14",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-10-07",<br />
"changed_before": "2015-10-14",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
*{{Bug|1212246}} - Remove task queue borrowing<br />
** land<br />
<br />
*{{Bug|1212220}} - It seems not safe to read sVideoQueueSendToCompositorSize off the main thread<br />
** land<br />
<br />
*{{Bug|1212701}} - Remove AbstractMediaDecoder::OnDecodeTaskQueue()<br />
** land<br />
<br />
*{{Bug|1213726}} - Remove AbstractMediaDecoder::HasInitializationData() <br />
** land<br />
<br />
*{{Bug|1212723}} - It seems racy to share mBufferedState among WebMReaders<br />
** land<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
** WIP<br />
* Review media-capture worker spec pull request from Intel editor.<br />
<br />
===Alastor Wu===<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-10-07&diff=1099472TPEMedia/2015-10-072015-10-07T03:19:39Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-09-30",<br />
"changed_before": "2015-10-07",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-09-30",<br />
"changed_before": "2015-10-07",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
*{{Bug|1209888}} - Remove usage of decoder monitor from OggReader<br />
** land<br />
<br />
*{{Bug|1209890}} - Remove usage of decoder monitor from MediaSourceDecoder<br />
** land<br />
<br />
*{{Bug|1209864}} - Remove usage of decoder monitor from MediaOmxCommonDecoder<br />
** land<br />
<br />
*{{Bug|1211327}} - Remove unnecessary usage of decoder monitor from MediaDecoderReader and sub-classes<br />
** land<br />
<br />
*{{Bug|1194918}} - Create base class VideoSink and encapsulate MDSM::UpdateRenderVideoFrame related-logic into DecodedVideoDataSink<br />
** review code<br />
<br />
*{{Bug|1211364}} - Do MDSM::CheckFrameValidity() earlier when video frames arrive MDSM.<br />
** review code<br />
<br />
*{{Bug|1208934}} - Remove usage of decoder monitor from MDSM<br />
** land<br />
<br />
*{{Bug|1211766}} - Remove AbstractMediaDecoder::GetReentrantMonitor()<br />
** land<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph<br />
**Made a initial plan and commented on bugzilla and got positive feedback from roc.<br />
<br />
===Alastor Wu===<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-09-23&diff=1099094TPEMedia/2015-09-232015-10-05T03:30:20Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-09-16",<br />
"changed_before": "2015-09-23",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-09-16",<br />
"changed_before": "2015-09-23",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
*{{Bug|1204430}} - Make MediaDecoder::IsMediaSeekable run on the main thread.<br />
** land<br />
<br />
*{{Bug|1206574}} - Remove AbstractMediaDecoder::IsShutdown()<br />
** land<br />
<br />
*{{Bug|1206576}} - Dispatch some MDSM functions to hide its internal thread model<br />
** land<br />
<br />
*{{Bug|1206578}} - Group public and private functions respectively for MDSM<br />
** land<br />
<br />
*{{Bug|1206607}} - Remove some dead code from MDSM<br />
** land<br />
<br />
*{{Bug|1207017}} - Some code clean up of MediaDecoder<br />
** land<br />
<br />
*{{Bug|1188643}} - [Aries][Flame][Ringtones] Previewing a custom ringtone sounds choppy<br />
** land<br />
<br />
*{{Bug|1207915}} - Apply the fix of bug 1052206 to DecodedStream<br />
** land<br />
<br />
===Benjamin Chen===<br />
*{{Bug|1206719}} - [B2G] Throttle the seek command from videocontrols.xml.<br />
**patch ready<br />
<br />
*Debugging Bug 1198664 with John.<br />
<br />
===Chiahung Tai===<br />
*PTO 3 days.<br />
*1 National Holiday.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1167465}} - Exposing Allowed Audio Channels in System App's Window <br />
** Review code<br />
<br />
*{{Bug|1206174}} - Improve code readability of FMRadioService<br />
** Review code<br />
<br />
*{{Bug|1206212}} - Remove AUDIO_STREAM_FM after KK<br />
** Review code<br />
<br />
*{{Bug|1201007}} - [B2G] Enable mono audio setting option in Gaia <br />
** r+<br />
<br />
*{{Bug|1196358}} - [B2G] Volume setting is wrong after reboot the phone<br />
** r+<br />
<br />
*{{Bug|1175447}} - mono audio support<br />
** r?<br />
<br />
*{{Bug|1206581}} - Implement notifyChannel() on AudioChannel API <br />
** f+<br />
<br />
*{{Bug|1183033}} - [B2G] Keyboard doesn't have click sound<br />
** f?<br />
<br />
*{{Bug|1202967}} - [AriesKK]Screen will be freezed if you launch camera by pressing HW camera key in Recent View.<br />
** Debug<br />
<br />
*{{Bug|1204622}} - crash in strlen | __vfprintf<br />
** Debug<br />
<br />
* Discuss MediaSession API on FxOS<br />
<br />
===Blake Wu===<br />
*{{Bug|1166758}} - [Aries] Listening to music while the screen is locked results in intermittent beeps/buzzing noises <br />
**Debug<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-09-30&diff=1095606TPEMedia/2015-09-302015-09-16T14:27:28Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-09-23",<br />
"changed_before": "2015-09-30",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-09-23",<br />
"changed_before": "2015-09-30",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*PTO 0.5 day<br />
*Prepare session for platform team visit.<br />
<br />
===Alastor Wu===<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-09-09&diff=1094348TPEMedia/2015-09-092015-09-09T03:31:51Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-09-02",<br />
"changed_before": "2015-09-09",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-09-02",<br />
"changed_before": "2015-09-09",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
*{{Bug|1195158}} - Remove MediaDecoder::QueueMetadata<br />
** land<br />
<br />
*{{Bug|1199121}} - Move clock switching code from MDSM into MediaSink<br />
** land<br />
<br />
*{{Bug|1199155}} - Create a subclass of MediaSink to wrap DecodedStream for audio/video rendering<br />
** land<br />
<br />
*{{Bug|1202533}} - Fix naming convention of MediaSink::PlaybackParams<br />
** land<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
**WIP, suspend<br />
**Talk with roc in IRC, suspend this bug. Would finished Bug 1201363 first.<br />
*{{Bug|1201363}} - Stop buffering video in the MediaStreamGraph.<br />
**Start investigation.<br />
*Team Building 1 day.<br />
*PTO 0.5 day.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1129882}} - [B2G] Using the new audio channel design to manage the telephony's sound<br />
** r+<br />
<br />
*{{Bug|1175447}} - mono audio support<br />
** r?<br />
<br />
*{{Bug|1201007}} - [B2G] Enable mono audio setting option in Gaia <br />
** r?<br />
<br />
*{{Bug|1200126}} - (Use TTS) Can not use Earpiece to speak SpeechSynthesisUtterance.<br />
*{{Bug|1198184}} - [FFOS_2.2][KeyPad] Key pad tone delays too much<br />
** Partner support<br />
<br />
*{{Bug|1198146}} - [GC][FFOS 2.2][Call]Incoming call will end all call when make one outgoing call.<br />
** Debug<br />
<br />
* Study MediaSession API.<br />
<br />
===Blake Wu===<br />
*Visit Panasonic and MTK in Osaka<br />
**2016 model porting<br />
**2017 model planning <br />
*Study B2G audio backend<br />
*{{Bug|1188643}} - [Aries][Flame][Ringtones] Previewing a custom ringtone sounds choppy<br />
**Debug<br />
*{{Bug|1166758}} - [Aries] Listening to music while the screen is locked results in intermittent beeps/buzzing noises <br />
**Debug<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-09-02&diff=1093299TPEMedia/2015-09-022015-09-02T02:01:35Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-08-26",<br />
"changed_before": "2015-09-02",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-08-26",<br />
"changed_before": "2015-09-02",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
**WIP, r?<br />
**Add VideoMonitorEvent Constructor.<br />
<br />
===Alastor Wu===<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-08-26&diff=1091993TPEMedia/2015-08-262015-08-26T02:53:07Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-08-19",<br />
"changed_before": "2015-08-26",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-08-19",<br />
"changed_before": "2015-08-26",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
**WIP, f?<br />
**Change API to return Promise.<br />
*Meeting with Product team.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1129882}} - [B2G] Using the new audio channel design to manage the telephony's sound<br />
** Backout<br />
<br />
*{{Bug|1170117}} - [B2G] Separate volume control settings for multiple audio profiles<br />
*{{Bug|1179181}} - [B2G] Store separate volume setting into setting database<br />
** Rebase to v2.2r<br />
<br />
*{{Bug|1196358}} - [B2G] Volume setting is wrong after reboot the phone<br />
** WIP<br />
<br />
*{{Bug|1194442}} - Code clean up of AudioManager<br />
** Review code<br />
<br />
* PTO 1 day<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-08-19&diff=1091424TPEMedia/2015-08-192015-08-23T15:35:54Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-08-12",<br />
"changed_before": "2015-08-19",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-08-12",<br />
"changed_before": "2015-08-19",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
*{{Bug|1179667}} Use MediaPromise for Gonk PlatformDecodeModule Init()<br />
** landed<br />
*{{Bug|1195625}} Use correct TaskQueue in SharedDecoderManager and H264Converter promise.<br />
** landed<br />
*{{Bug|1193626}} Can't seek video on gonk<br />
** clarified - dup bug 1192748 <br />
*{{Bug|1193647}} Video keeps rendering the same image on gonk<br />
** clarified - dup bug 1192748 <br />
*{{Bug|1123246}} [Flame][Browser]The frame of Youtube video can't be displayed, it only have voice or you can't continue to play<br />
** checking<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
*{{Bug|1192708}} - Remove redundant call to CloseChannel in ChannelMediaResource::CacheClientSeek<br />
** land<br />
<br />
*{{Bug|1191696}} - Have DecodedStream::StartPlayback return a promse to indicate the end of stream playback<br />
** land<br />
<br />
*{{Bug|1194112}} - Enable Move when possible in exclusive mode for MediaEventSource to save copy and allow move-only types<br />
** land<br />
<br />
*{{Bug|1195158}} - Remove MediaDecoder::QueueMetadata<br />
** land<br />
<br />
*{{Bug|1187092}} - [Clock] Changing 'Sound' for an alarm will not preview the first track selected<br />
** review code<br />
<br />
*{{Bug|1195187}} - Move output stream manipulation code to its own classes from DecodedStream<br />
** land<br />
<br />
*{{Bug|1195185}} - Decouple the creation of mData from output stream addition for DecodedStream<br />
** land<br />
<br />
*{{Bug|1195601}} - Remove MediaDecoderStateMachine::mLogicallySeeking<br />
** WIP<br />
<br />
*{{Bug|1196112}} - Remove DecodedStream::mMonitor<br />
** WIP<br />
<br />
*{{Bug|1195632}} - Let DecodedStream have a worker thread<br />
** WIP<br />
<br />
===Benjamin Chen===<br />
*{{Bug|762774}} - Intermittent test_loop.html | Test timed out.<br />
** find clues...<br />
*{{Bug|1071375}} - Re-enable test_mediarecorder_xxx.html tests on B2G emulator<br />
** landing<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
**WIP, f?<br />
**Modify code based on Kaku's comments.<br />
*Interview with 2 candidates.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1179181}} - [B2G] Store separate volume setting into setting database<br />
** Land<br />
<br />
*{{Bug|1193245}} - Using Atomic in the suspend count of the media resource <br />
** Land<br />
<br />
*{{Bug|1187092}} - [Clock] Changing 'Sound' for an alarm will not preview the first track selected<br />
** Land <br />
<br />
*{{Bug|1191207}} - The state of an audio channel will just become inactive after go to home screen<br />
** r+<br />
<br />
*{{Bug|1183033}} - [B2G] Keyboard doesn't have click sound<br />
** r?<br />
<br />
*{{Bug|1192748}} - [B2G] Only send the moz-interrupt event when audio competing happens<br />
** r?<br />
<br />
*{{Bug|1188754}} - Music app doesn't resume after Ringer ends.<br />
** Debug<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
* {{Bug|1141979}} - [FoxEye] Extend ImageBitmap with interfaces to access its underlying image data<br />
** Modify specification.<br />
* {{Bug|1190210}} - Heap-use-after-free in mozilla::dom::CropDataSourceSurface<br />
** r+<br />
** sec-approval+<br />
** land<br />
** should request "Aurora approval" later.<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
** Giev feedback.<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=User_talk:Dead_project&diff=1091362User talk:Dead project2015-08-22T11:02:25Z<p>Ctai: /* Beyond 2015 */</p>
<hr />
<div>=Abstract=<br />
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area.<br />
<br />
=Introduction=<br />
To get a quick understand what is project FoxEye. Please see below file:<br><br />
'''The latest one:'''<br />
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing]<br />
*Presentation files in Whistler Work Week:<br />
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye]<br />
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case]<br />
*Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]<br />
'''Outdated'''<br />
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br><br />
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br><br />
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br><br />
<br />
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. <br />
<br><br />
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked.<br />
<br />
[[File:FoxEye - Overview.png|800px]]<br />
<br><br />
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing.<br />
<br><br />
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. <br />
<br><br />
<br />
=Design Principle=<br />
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto]<br />
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt><br />
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. <br />
<br />
*Performance and power consumption do matter<br />
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one.<br />
<br />
=Concept=<br />
==MediaStreamTrack with Worker: ==<br />
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification.<br />
<source lang="webidl"><br />
[Constructor(DOMString scriptURL)]<br />
<br />
partial interface MediaStreamTrack {<br />
void addWorkerMonitor (Worker worker);<br />
void removeWorkerMonitor (Worker worker);<br />
MediaStreamTrack addWorkerProcessor (Worker worker);<br />
void removeWorkerProcessor ();<br />
};<br />
<br />
partial interface WorkerGlobalScope {<br />
attribute EventHandler onvideoprocess;<br />
};<br />
<br />
interface VideoMonitorEvent : Event {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap inputImageBitmap;<br />
};<br />
<br />
interface VideoProcessorEvent : VideoMonitorEvent {<br />
readonly attribute DOMString trackId;<br />
readonly attribute double playbackTime;<br />
readonly attribute ImageBitmap inputImageBitmap;<br />
attribute ImageBitmap? outputImageBitmap=null;<br />
};<br />
<br />
</source><br />
<br><br />
[[File:NewProjectFoxEye1.png|1024px]]<br><br />
===Example Code ===<br />
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker].<br />
<br />
==ImageBitmap extensions==<br />
Please see [3] for more information.<br />
<br />
==WebImage:==<br />
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage.<br />
<br><br />
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage.<br />
<br><br />
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image.<br />
<br><br />
[[File:OpenVX-NodeGFX.PNG|600px]]<br><br />
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--><br />
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided.<br />
<br />
[[File:OpenVX.PNG|600px]]<br><br />
<br />
==OfflineMediaContext:==<br />
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together.<br />
<br />
<source lang="c++"><br />
<br />
// typedef unsigned long long DOMTimeStamp;<br />
interface OfflineMediaContext {<br />
void start(DOMTimeStamp durationToStop);<br />
attribute EventHandler onComplete;<br />
};<br />
// Add an optional argument into the constructor.<br />
[Constructor (optional OfflineMediaContext context),<br />
Constructor (MediaStream stream, optional OfflineMediaContext context),<br />
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)]<br />
interface MediaStream : EventTarget {<br />
// No modification.<br />
...<br />
}<br />
<br />
</source><br />
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate.<br />
*OfflineMediaContext is also the object who can trigger the non-realtime processing.<br />
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below)<br />
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification.<br />
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext.<br />
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG.<br />
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.)<br />
<br />
==OpenCV.js==<br />
*OpenCV + Emscripten = OpenCV.js<br />
*https://github.com/CJKu/opencv <br />
<!--<br />
==Deprecated Design ==<br />
*Modular Routing<br />
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes.<br />
<br />
Here is a example for face detection work on ImageElement:<br><br />
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code.<br />
<br>[[File:Project FoxEye1.png|720px]]<br><br />
<big>Example 1:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var facedetect = context.createFaceDetection();<br />
source.connect(facedetect);<br />
var dest = context.createMediaStreamDestination();<br />
facedetect.connect(dest);<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
<br />
</source><br />
<br />
<br />
Another example to show that some nodes might support callback function to pass more information rather than image.<br />
<br>[[File:Project FoxEye2.png|720px]]<br><br />
<big>Example 2:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var imageElem = document.getElementById('imgelemsrc');<br />
var source = context.createImageElementSource(imageElem);<br />
var textdetect = context.createTextDetection();<br />
source.connect(textdetect);<br />
var dest = context.createMediaStreamDestination();<br />
textdetect.connect(dest);<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
var go2google = document.getElementById('go2Google');<br />
go2google.href = "https://www.google.com.tw/search?q=" + text<br />
var go2IMDB = document.getElementById('go2IMDB');<br />
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text<br />
var go2Amazon = document.getElementById('go2Amazon');<br />
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text<br />
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC');<br />
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text<br />
}<br />
var elem = document.getElementById('videoelem');<br />
elem.mozSrcObject = dest.stream;<br />
elem.play();<br />
</source><br />
An ideally example to combine ScriptNode with Canvas2DContext.<br><br />
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br><br />
Haven't finish the implementation for this example.<br><br />
<br>[[File:Project FoxEye3.png|720px]]<br><br />
<big>Example 3:</big><br />
<source lang="javascript"><br />
var context = new DIPContext();<br />
var myMediaStream;<br />
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){<br />
myMediaStream = localMediaStream;<br />
var source = context.createMediaStreamSource(myMediaStream);<br />
var textRecoginition = context.createTextRecoginition();<br />
source.connect(textRecoginition);<br />
var textInpaint = context.createTextInpaint();<br />
textRecoginition.connect(textInpaint);<br />
var scriptTranslate = context.createScriptProcessor();<br />
textdetect.ontextrecognized = function (e){<br />
var text = e.recognizedText;<br />
// Custom parameter -<br />
scriptTranslate.addParameter( "text", text );<br />
}<br />
scriptTranslate.onimageprocess= function (e) {<br />
var text = e.parameters.text;<br />
// Translate to other language....<br />
var newText = Translate("Eng", "TC");<br />
var input = e.inputImage;<br />
var canvas = document.getElementsByTagName('canvas')[0];<br />
var context2D = canvas.getContext('2d');<br />
context2D.drawImage(input, 0, 0);<br />
context2D.strokeText(newText, 0, 0);<br />
// get an empty slate to put the data into<br />
var output = context.createImageData(canvas.width, canvas.height);<br />
e.outputImage = output;<br />
}<br />
textInpaint.connect(scriptTranslate);<br />
var dest = createMediaStreamDestination();<br />
scriptTranslate.connect(dest);<br />
var video = document.getElementById(‘videoelem’);<br />
video.mozSrcObject = dest.stream;<br />
}, null);<br />
<br />
</source><br />
<br />
<!--<br />
=The API=<br />
Still under construction....<br />
==New design==<br />
===VideoContext===<br />
<source><br />
[Constructor]<br />
interface VideoContext : EventTarget {<br />
readonly attribute VideoDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamVideoDestinationNode createMediaStreamDestination();<br />
MediaStreamVideoSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
VideoWorkerNode createVideoWorker(DOMString scriptURL);<br />
};<br />
</source><br />
===VideoNode===<br />
<source><br />
interface VideoNode: EventTarget {<br />
void connect(VideoNode destination);<br />
void disconnect();<br />
readonly attribute VideoContext context;<br />
};<br />
</source><br />
<br />
===VideoWorkerNode===<br />
Still thinking the type of inputImage/outputImage.<br />
<source><br />
interface VideoProcessEvent : Event {<br />
readonly attribute ImageData inputImage;<br />
readonly attribute ImageData outputImage;<br />
readonly attribute object parameters; <br />
};<br />
<br />
interface VideoWorkerNode: VideoNode {<br />
attribute EventHandler onimageprocess;<br />
};<br />
</source><br />
<br />
==Deprecated design==<br />
===DIPContext===<br />
<source><br />
[Constructor]<br />
interface DIPContext : EventTarget {<br />
readonly attribute DIPDestinationNode destination;<br />
// DIPNode creation<br />
MediaStreamDIPDestinationNode createMediaStreamDestination();<br />
ImageElementDIPSourceNode createImageElementSource(HTMLImageElement imageElement);<br />
MediaStreamDIPSourceNode createMediaStreamSource(MediaStream mediaStream);<br />
FaceDetectionNode createFaceDetection();<br />
TextDetectionNode createTextDetection();<br />
};<br />
</source><br />
===DIPNode===<br />
<source><br />
interface DIPNode : EventTarget {<br />
void connect(DIPNode destination, optional unsigned long output = 0, optional unsigned long input = 0);<br />
void disconnect(optional unsigned long output = 0);<br />
readonly attribute DIPContext context;<br />
readonly attribute unsigned long numberOfInputs;<br />
readonly attribute unsigned long numberOfOutputs;<br />
};<br />
</source><br />
<br />
===TextDetectionNode===<br />
<source><br />
interface RecognizedTextEvent : Event {<br />
readonly attribute DOMString recognizedText;<br />
};<br />
<br />
interface TextDetectionNode : DIPNode {<br />
attribute EventHandler ontextrecognized;<br />
};<br />
</source><br />
--><br />
<br />
=Demo pages=<br />
==OpenCV.js==<br />
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br><br />
http://people.mozilla.org/~cku/opencv/<br />
<br><br />
[[File:OpenCVJS-1.png|1080px]]<br />
<br><br />
[[File:OpenCVJS-2.png|1080px]]<br />
<br><br />
<br />
==MST with Worker and ImageBitmap==<br />
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br><br />
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo.<br />
<br><br />
Source code of the demo:<br><br />
https://github.com/kakukogou/foxeye-demo<br />
<br><br />
Demo website:<br><br />
http://people.mozilla.org/~tkuo/foxeye-demo/<br />
===Monitor===<br />
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. <br />
*Case 1: Face detection<br />
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head.<br />
<br><br />
[[File:Monitor face.png|1080px]]<br />
<br><br />
*Case 2: QRCode<br />
[[File:Monitor qrcode.png|1080px]]<br />
<br><br />
<br />
===Processor===<br />
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter.<br />
<br> <br />
<br><br />
[[File:Processor copy.png|1080px]]<br />
<br><br />
[[File:Processor blur.png|1080px]]<br />
<br><br />
[[File:Processor erode.png|1080px]]<br />
<br><br />
[[File:Processor threshold.png|1080px]]<br />
<br><br />
[[File:Processor invert.png|1080px]]<br />
<br><br />
[[File:Processor gray.png|1080px]]<br />
<br><br />
<br />
<br />
<br />
<!--<br />
==Demo 1: Face tracker==<br />
===Browser:===<br />
*Input comes from HTML Image Element<br />
[[File:ProjectFoxEye BrowserFaceImage.png|720px]]<br />
*Input comes from MediaStream<br />
[[File:ProjectFoxEye BrowserFaceMS.png|720px]]<br />
<br />
===B2G on Flame:===<br />
*Input comes from HTML Image Element<br><br />
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br><br />
*Input comes from MediaStream<br><br />
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br><br />
<br />
==Demo 2: Text Recognition==<br />
*Browser:<br />
**Input comes from HTML Image Element<br />
[[File:ProjectFoxEye TextImage.png|720px]]<br />
**Input comes from MediaStream<br />
[[File:ProjectFoxEye TextMS.png|720px]]<br />
--><br />
<br />
=Unlimited Potentials=<br />
<!--<br />
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br><br />
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br><br />
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br><br />
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br><br />
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> <br />
--><br />
==FoxEye technology tree==<br />
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control.<br />
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]]<br />
<br />
<br />
==Use Cases==<br />
*Digital Image Processing(DIP) for camera:<br />
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In]<br />
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR]<br />
**Camera Panorama, <br />
**Fisheye camera, <br />
**Comic Effect,<br />
**Long term, might need Android Camera HAL 3 to control camera<br />
***Smile Snapshot<br />
***Gesture Snapshot<br />
***HDR<br />
***Video Stabilization<br />
**Bar code scanner<br />
*Photo and video editing<br />
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android]<br />
**A faster way for video editing tools.<br />
**Lots of existing image effects can be used for photo and video editing.<br />
**https://www.facebook.com/thanks<br />
*Object Recognition in Image(Not only FX OS, but also broswer):<br />
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly]<br />
**Face Detection/Tracking,<br />
**Face Recognition, <br />
**Text Recognition, <br />
**Text Selection in Image, <br />
***See http://projectnaptha.com/<br />
**Text Inpainting,<br />
**Image Segmentation,<br />
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo]<br />
*Duo Camera:<br />
**Nature Interaction(Gesture, Body Motion Tracking)<br />
**Interactive Foreground Extraction<br />
and so on....<br />
<br />
==Some cool applications we can refer in real worlds==<br />
*Word Lens: <br />
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo<br />
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8<br />
*Waygo<br />
**http://www.waygoapp.com/<br />
*PhotoMath<br />
**https://photomath.net/<br />
*Cartoon Camera<br />
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera<br />
*Photo Studio<br />
**http://photo-studio.en.uptodown.com/android<br />
*Magisto<br />
**https://play.google.com/store/apps/details?id=com.magisto<br />
*Adobe PhotoShop Express<br />
**http://www.photoshop.com/products/photoshopexpress<br />
*Amazon(firefly app)<br />
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android<br />
<br />
<!--<br />
=Task Dependency=<br />
Below is the dependency graph for text recognition work.<br />
<br><br />
[[File:Project FoxEyeTextRecgDependency.png]]<br />
<br><br />
Below is the dependency graph for camera efficts work.<br />
<br><br />
[[File:Project FoxEyeCameraDependency.png]]<br />
<br><br />
Below is the dependency graph for video editor work.<br />
<br><br />
[[File:Project FoxEyeVideoEditorDependency.png]]<br />
<br><br />
--><br />
<br />
<!--<br />
=Comparison=<br />
==Canvas2DContext==<br />
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below.<br />
<source><br />
function frameConverter(video,canvas) {<br />
<br />
// Set up our frame converter<br />
this.video = video;<br />
this.viewport = canvas.getContext("2d");<br />
this.width = canvas.width;<br />
this.height = canvas.height;<br />
// Create the frame-buffer canvas<br />
this.framebuffer = document.createElement("canvas");<br />
this.framebuffer.width = this.width;<br />
this.framebuffer.height = this.height;<br />
this.ctx = this.framebuffer.getContext("2d");<br />
// Default video effect is blur<br />
this.effect = JSManipulate.blur;<br />
// This variable used to pass ourself to event call-backs<br />
var self = this;<br />
// Start rendering when the video is playing<br />
this.video.addEventListener("play", function() {<br />
self.render();<br />
}, false);<br />
<br />
// Change the image effect to be applied <br />
this.setEffect = function(effect){<br />
if(effect in JSManipulate){<br />
this.effect = JSManipulate[effect];<br />
}<br />
}<br />
<br />
// Rendering call-back<br />
this.render = function() {<br />
if (this.video.paused || this.video.ended) {<br />
return;<br />
}<br />
this.renderFrame();<br />
var self = this;<br />
// Render every 10 ms<br />
setTimeout(function () {<br />
self.render();<br />
}, 10);<br />
};<br />
<br />
// Compute and display the next frame <br />
this.renderFrame = function() {<br />
// Acquire a video frame from the video element<br />
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth,<br />
this.video.videoHeight,0,0,this.width, this.height);<br />
var data = this.ctx.getImageData(0, 0, this.width, this.height);<br />
// Apply image effect<br />
this.effect.filter(data,this.effect.defaultValues);<br />
// Render to viewport<br />
this.viewport.putImageData(data, 0, 0);<br />
return;<br />
};<br />
};<br />
<br />
// Initialization code<br />
video = document.getElementById("video");<br />
canvas = document.getElementById("canvas");<br />
fc = new frameConverter(video,canvas);<br />
...<br />
// Change the image effect applied to the video<br />
fc.setEffect('edge detection');<br />
<br />
</source><br />
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br><br />
<br />
Compare to this approach, the proposed WebAPI has below advantages:<br />
* Not polling mechanism.<br />
** We use callback function to process all frames.<br />
<br />
==node-opencv==<br />
https://github.com/peterbraden/node-opencv<br />
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js."<br />
The sample codes looks like below:<br />
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported.<br />
<source lang="javascript"><br />
cv.readImage(filename, function(err, mat){<br />
mat.convertGrayscale()<br />
mat.canny(5, 300)<br />
mat.houghLinesP()<br />
})<br />
</source><br />
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus:<br />
<source lang="javascript"><br />
var s = new cv.ImageStream()<br />
s.on('data', function(matrix){<br />
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){})<br />
})<br />
ardrone.createPngStream().pipe(s);<br />
</source><br />
<br />
==opencvjs==<br />
https://github.com/blittle/opencvjs<br><br />
It is a project to compile opencv to asm.js. Might be a dead project now.<br />
==Project Naptha==<br />
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ .<br />
===How it works?===<br />
Excerpt from Project Naptha:<br />
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model.<br />
--><br />
<!--<br />
=Open Source Library and Licenses =<br />
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms.<br />
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available.<br />
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.<br />
<br />
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. "<br />
<br />
<br />
=What I have done=<br />
*Initialize, plan and implement this project.<br />
*Write a prototype of WebIDL for WebDIP.<br />
**MediaStream as source node and destentaion node for WebDIP.<br />
**For HTMLImageElement part as source node, there is a temporal solution for it.<br />
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame.<br />
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser.<br />
--><br />
<br />
<!--<br />
=To Do List=<br />
*Extend MediaStreamTrack API? =>CTai<br />
**Need to integrate with Canvas2DContext and WebGL.<br />
*OfflineMediaContext study. =>Kaku(our new hire!!)<br />
*An API for image processing and object detection. =>TBD<br />
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs).<br />
*How to compile OpenCV to asm.js =>Kaku, CJay<br />
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js<br />
*Compare native OpenCV/Tesseract with asm.js version. =>TBD<br />
--><br />
<br />
<!--<br />
=Fixme List(Known Issues)=<br />
*OpenCV can't build with STLPort, only support GNUSTL.<br />
**B2G can't build with GNUSTL.<br />
*Text Detection and Recognition can't run on B2G.<br />
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error.<br />
*Tesseract-OCR Build<br />
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR.<br />
*Improve precision rate of text recognition.<br />
**The actual precision rate should be higher than my roughly prototype. Need improve it.<br />
*Separate OCR initialized.<br />
**Prevent redundant initialization.<br />
*Haven't done OpenCL integration in Gecko.<br />
**OpenCV has a lot of OpenCL integration. We should take advantage from it.<br />
*Canvas2DContext, WebGL can't run on worker.<br />
**Need bug 801176 and bug 709490 landed.<br> <br />
*Need ImageBitmap for VideoWorkerEvent.<br />
**Need bug 1044102 landed.<br><br />
--><br />
<br />
=Current Status=<br />
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203<br />
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102<br />
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br />
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br><br />
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/<br />
*OfflineMediaContext: Not yet started.<br />
*WebImage:Not yet started.<br />
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br><br />
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br><br />
=Next Phase(2015 H2)=<br />
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[4] for how to process standardization in Mozilla.<br />
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base.<br />
*Start to work on OfflineMediaContext.<br />
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo.<br />
*Do some explanatory experiment on WebImage concept.<br />
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry]<br />
<br />
=Beyond 2015=<br />
*Proof of Concept for WebImage.<br />
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ?<br />
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)?<br />
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie]<br />
<br />
=Conclusion=<br />
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness.<br />
<br />
=References=<br />
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/<br />
*[2]:"Media Capture Stream with Worker", http://chiahungtai.github.io/mediacapture-worker/<br />
*[3]:"ImageBitmap Extensions", http://kakukogou.github.io/spec-imagebitmap-extension/<br />
*[4]:"Mozilla Standards", https://wiki.mozilla.org/Standards<br />
<br />
=Acknowledgements=<br />
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code.<br />
<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).<br />
<br />
==Kaku==<br />
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office.<br />
==CJ Ku==<br />
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part.</div>Ctaihttps://wiki.mozilla.org/index.php?title=Project_Cangjie&diff=1091360Project Cangjie2015-08-22T09:50:14Z<p>Ctai: /* Introduction */</p>
<hr />
<div>=Introduction=<br />
This is a sub-project of [https://wiki.mozilla.org/Project_FoxEye Project FoxEye]. This is an experimental project <br />
<br />
<br />
The name of this project is based on a legendary figure in ancient China called "Cangjie"[1].<br />
<br />
=Plan=<br />
*2015 Q3: Familiar with depth camera and gesture stuffs.<br />
*2015 Q4: Proof of concept phase. And discuss with UX team if necessary.<br />
=References=<br />
[1]: https://en.wikipedia.org/wiki/Cangjie Cangjie<br><br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).</div>Ctaihttps://wiki.mozilla.org/index.php?title=Project_Cangjie&diff=1091359Project Cangjie2015-08-22T09:48:27Z<p>Ctai: Created page with "=Introduction= This is a sub-project of [https://wiki.mozilla.org/Project_FoxEye Project FoxEye]. This is an experimental project The name of this project is based on an ca..."</p>
<hr />
<div>=Introduction=<br />
This is a sub-project of [https://wiki.mozilla.org/Project_FoxEye Project FoxEye]. This is an experimental project <br />
<br />
<br />
The name of this project is based on an called "Cangjie"[1]. <br />
=Plan=<br />
*2015 Q3: Familiar with depth camera and gesture stuffs.<br />
*2015 Q4: Proof of concept phase. And discuss with UX team if necessary.<br />
=References=<br />
[1]: https://en.wikipedia.org/wiki/Cangjie Cangjie<br><br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).</div>Ctaihttps://wiki.mozilla.org/index.php?title=Project_GlovePuppetry&diff=1090022Project GlovePuppetry2015-08-13T15:03:55Z<p>Ctai: /* References */</p>
<hr />
<div>=Introduction=<br />
This is a sub-project of [https://wiki.mozilla.org/Project_FoxEye Project FoxEye]. This is an experimental project trying to discover how current gesture capability can bring what kind of new experience to the Web. So the goal of this project will be knowing current technology boundary on gesture area. And based on current technology limitation, we would like to try is there any possiblility to create an attractive user experience on Web broswing?<br />
The name of this project is based on an traditional opera called "Glove Puppetry"[1]. You can see how amazing "Glove Puppetry" is on Youtube[2].<br />
<br />
=Plan=<br />
*2015 Q3: Familiar with depth camera and gesture stuffs.<br />
*2015 Q4: Proof of concept phase. And discuss with UX team if necessary.<br />
=References=<br />
[1]: https://en.wikipedia.org/wiki/Glove_puppetry<br><br />
[2]: https://www.youtube.com/user/epilinet<br />
=About Authors=<br />
==CTai==<br />
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com).</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-08-12&diff=1089601TPEMedia/2015-08-122015-08-12T07:09:15Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-08-05",<br />
"changed_before": "2015-08-12",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-08-05",<br />
"changed_before": "2015-08-12",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
===Benjamin Chen===<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
**WIP, f?<br />
**Writing mochitest.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1129882}} - [B2G] Using the new audio channel design to manage the telephony's sound<br />
** land<br />
<br />
*{{Bug|1186135}} - [Camera][Video] Recording a video, no chime indicator that recording has started or ended<br />
** land<br />
<br />
*{{Bug|1193245}} - Using Atomic in the suspend count of the media resource<br />
** r+<br />
<br />
*{{Bug|1186572}} - [FTU] Skipping through initial screen of tutorial will cause tutorial animated images to stop loading<br />
** r?<br />
<br />
*{{Bug|1187092}} - [Clock] Changing 'Sound' for an alarm will not preview the first track selected<br />
** r?<br />
<br />
*{{Bug|1191207}} - The state of an audio channel will just become inactive after go to homescreen<br />
** r-<br />
<br />
*{{Bug|1192748}} - [B2G] Only send the moz-interrupt-begin when the audio is really be interrupted<br />
** WIP<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
<br />
===Munro Chiang===<br />
<br />
===Adam Chou===</div>Ctaihttps://wiki.mozilla.org/index.php?title=TPEMedia/2015-08-05&diff=1089142TPEMedia/2015-08-052015-08-10T11:48:17Z<p>Ctai: /* Chiahung Tai */</p>
<hr />
<div>==Summary==<br />
Status changed by this week<br />
<bugzilla><br />
{<br />
"status": ["NEW", "ASSIGNED", "UNCONFIRMED"],<br />
"changed_after": "2015-07-29",<br />
"changed_before": "2015-08-05",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to"<br />
}<br />
</bugzilla><br />
Bugs which are fixed by this week<br />
<bugzilla><br />
{<br />
"changed_after": "2015-07-29",<br />
"changed_before": "2015-08-05",<br />
"assigned_to": ["jolin@mozilla.com", "ayang@mozilla.com", "jwwang@mozilla.com", "bechen@mozilla.com", "ctai@mozilla.com", "alwu@mozilla.com", "bwu@mozilla.com", "tkuo@mozilla.com", "mchiang@mozilla.com", "hahuang@mozilla.com"],<br />
"include_fields": "id, summary, status, resolution, assigned_to",<br />
"changed_field": "status",<br />
"changed_field_to": "resolved"<br />
}<br />
</bugzilla><br />
==Status==<br />
<br />
===Alfredo Yang===<br />
<br />
===John Lin===<br />
<br />
===JW Wang===<br />
<br />
*{{Bug|1189197}} - Race in bufferqueue_callback (cubeb_opensl.c)<br />
** land<br />
<br />
*{{Bug|1189204}} - Twitch.tv HTML5 player does not automatically play live stream video<br />
** land<br />
<br />
*{{Bug|1189866}} - Intermittent 691096-1.html | application crashed [@ opensl_stream_destroy]<br />
** land<br />
<br />
*{{Bug|1187092}} - [Clock] Changing 'Sound' for an alarm will not preview the first track selected<br />
** review code<br />
<br />
*{{Bug|1188269}} - Create AudioShutdownManager to manage the lifetime of different AudioSinks.<br />
** review code<br />
<br />
*{{Bug|1146796}} - (GMP_for_external_rendering) GMP Hardware Rendering API<br />
** review code<br />
<br />
*{{Bug|1187214}} - Implement a thread-safe observer to publish events across threads<br />
** land<br />
<br />
*{{Bug|1188257}} - Employ MediaEventSource for MediaQueue to send notifications to the listeners<br />
** land<br />
<br />
*{{Bug|1191170}} - Move DecodedStreamData from the header to its source file<br />
** land<br />
<br />
*{{Bug|1189624}} - Have AudioSink listen to MediaQueue events to know whether to continue playback<br />
** land<br />
<br />
*{{Bug|1191171}} - Add SetVolume() to DecodedStream<br />
** WIP<br />
<br />
*{{Bug|1191173}} - Mirror MediaDecoder::mSameOriginMedia in MDSM<br />
** land<br />
<br />
===Benjamin Chen===<br />
*{{Bug|1188155}} - [Browser][Youtube]Only one window with a Youtube video will correctly load<br />
**Enable the dormant code path for MSE. patch r?<br />
*{{Bug|1048926}} - Fix and re-enable 789075-1.html and 795892-1.html on Android and B2G<br />
**landed, default preload action is preload_none on mobile platform.<br />
<br />
===Chiahung Tai===<br />
*{{Bug|1108950}} - [FoxEye] Associate MediaStreamTrack with WebWorker as WorkerMonitor. <br />
**WIP, f?<br />
**Implement worker feature and clean code for review.<br />
<br />
===Alastor Wu===<br />
*{{Bug|1184482}} - [B2G] Keep the audio channel competing even if there is no ringer sound in the vibration mode<br />
** Land<br />
<br />
*{{Bug|1186135}} - [Camera][Video] Recording a video, no chime indicator that recording has started or ended<br />
** r+<br />
<br />
*{{Bug|1187092}} - [Clock] Changing 'Sound' for an alarm will not preview the first track selected <br />
** r-<br />
<br />
*{{Bug|1129882}} - [B2G] Using the new audio channel design to manage the telephony's sound <br />
** Backout, debug<br />
<br />
===Blake Wu===<br />
<br />
===Kaku Kuo===<br />
* {{Bug|1044102}} - Implement ImageBitmap and createImageBitmap<br />
** land<br />
* {{Bug|1141979}} - [FoxEye] Extend ImageBitmap with interfaces to access its underlying image data<br />
** Modify specification.<br />
* {{Bug|1190210}} - Heap-use-after-free in mozilla::dom::CropDataSourceSurface<br />
** r?<br />
<br />
===Munro Chiang===<br />
*{{Bug|1190244}} - [aries] Enable slow motion recording on FFOX<br />
** Under review<br />
*{{Bug|1187364}} - [Gecko][Camera] Pause and Resume support during video recording<br />
<br />
===Hayden Huang===<br />
<br />
===Adam Chou===</div>Ctai