Confirmed users
770
edits
m (Ctai moved page Project FoxEye to User talk:Dead project: Not available now) |
No edit summary |
||
Line 1: | Line 1: | ||
=Abstract= | |||
The goal of this project is bringing the power of computer vision and image processing to the Web. By extending the spec of Media Capture and Streams, the web developers can write video processing related applications in better way. The primary idea is to incorporate Worker-based JavaScript video processing with MediaStreamTrack. The user's video processing script can do the real image processing and analysis works frame by frame. We also propose an extension of ImageBitmap to allow better optimization possibility in both implementation and Web developer side. To support video editor case, we would like to introduce OfflineMediaContext in next phase. We also want to explore the concept of WebImage, a hardware accelerated-able Web API for performance improvement chance. By accomplishing these API step by step, we believe we can improve the Web platform competitiveness a lot on image processing and computer vision area. | |||
=Introduction= | |||
To get a quick understand what is project FoxEye. Please see below file:<br> | |||
'''The latest one:''' | |||
*Orlando FoxEye session:[https://docs.google.com/presentation/d/14tI15Bvphew764XpOToAeUc5cgXT_kRTbD061ZCL5cY/edit?usp=sharing Orlando FoxEye] | |||
*FoxEye Briefing: [https://docs.google.com/presentation/d/1Ra5bIeMfSEQi5kd_bGH5Vx-pa9ahpAUTFDxTeBA5L0o/edit?usp=sharing Briefing] | |||
*Presentation files in Whistler Work Week: | |||
**Project FoxEye Status Update: [https://docs.google.com/presentation/d/1vnR5JOWpEgKN3KZGV2SLxscqSnuBOF2tT_dTolLRYI4/edit?usp=sharing FoxEye] | |||
**FoxEye Cross Firefox OS:[https://docs.google.com/presentation/d/1ck32VoikWkkFXkFfNK91S22nGRp9fSEj6Oz6846May4/edit?usp=sharing Use case] | |||
*'''Latest demo in Youtube: [https://www.youtube.com/watch?v=prybkXsTGXY FoxEye 2015 H1 demo]''' | |||
'''Outdated''' | |||
*Presentation file in Portland Work Week.[[File:Project FoxEye Portland Work Week.pdf]]<br> | |||
*Presentation file in P2PWeb WorkShop.[[File:Project FoxEye 2015-Feb.pdf]]<br> | |||
*Youtube: https://www.youtube.com/watch?v=TgQWEWiGaO8<br> | |||
The needs for image processing and computer vision is increasing in recent years. The introduction of video element and media stream in HTML5 is very important, allowing for basic video playback and WebCam ability. But it is not powerful enough to handle complex video processing and camera application. Especially there are tons of mobile camera, photo and video editor applications show their creativity by using OpenCV etc in Android. It is a goal of this project to include the capabilities found in modern video production and camera applications as well as some of the processing, recognition and stitching tasks. | |||
<br> | |||
This API is inspired by the WebAudio API[1]. Unlike WebAudio API, we try to reach the goal by modifying existing Media Capture and Streams API. The idea is adding some functions to associate the Woker-based script with MediaStreamTrack. Then the script code of Worker runs image processing and/or analysis frame by frame. Since we move the most of processing work to Worker, the main thread will not be blocked. | |||
[[File:FoxEye - Overview.png|800px]] | |||
<br> | |||
Basically, the spirit of this project is four parts. The first part is extend the MediaStreamTrack to associate a Worker. This part provide a way to do image processing job frame by frame. The second part is ImageBitmap extension. This part extended the ImageBitmap interface to allow JavaScript developer to read underlying data out and set an external data into an ImageBitmap in a set of supported color formats. The third part is OfflineMediaContext. It is for offline stream to render as fast as possible. The last part is WebImage, it is a hardware accelerated-able API on computer vision area. The Web developers can use it to combine high performance vision processing. | |||
<br> | |||
Thanks for the amazing asm.js and emscripten work, we also provide an asm.js version of OpenCV called OpenCV.js. The web developers can leverage the power of OpenCV in a simpler way. | |||
<br> | |||
=Design Principle= | |||
*Follow [https://extensiblewebmanifesto.org/ The Extensible Web Manifesto] | |||
The original design of this project is WebAudio like design called WebVideo. But is is deprecated due to we want to follow The Extensible Web Manifesto. We believe this kind of design is best for the Web. Take below examples to against WebVideo design.<bt> | |||
Computer vision area is more thriving than ever after deep learning introduced. We can't enumerate video nodes for all kinds of creativeness for the future. For example, suppose we enumerate all we can image in current time. But still something we can't image. Maybe we lost an area called image sentiment analysis. I take this as an example because I just known it recently. Image sentiment analysis is trying to classify the emotion from human face in image. So should we add this as a new video node? Take another extreme example, we think we would not have argument on face detection or face recognition. But what we talk now is only for human face. What if we someone think we need to detect and recognize animal face? Sounds like a ridiculous idea right? But it is a real business. A product called "Bistro: A Smart Feeder Recognizes Your Cat's Face"[2] was on indiegogo. So should we extend the original face detection/recognition node or add a new video node for cat or ask them to implement it in VideoWorker? If we ask them to implement in VideoWorker, then we might need to re-consider are those existing video node make sense? Should we move them all into VideoWorker and implement them in JS? If yse, the WebVideo will degenerate to only source node, video worker node and detestation node. Compare to degenerated WebVideo, the MediaStream with worker design is more elegant. | |||
*Performance and power consumption do matter | |||
This project is initialized by the FxOS team which means the performance and power consumption for mobile devices are considered from day one. | |||
=Concept= | |||
==MediaStreamTrack with Worker: == | |||
The new design is a simple and minimal change for current API. By extending MediaStreamTrack and adding Worker related API, we can let MediaStream be able to support video processing functionality through the script code in Worker. Below is the draft WebIDL codes. Please see [2] for the draft specification. | |||
<source lang="webidl"> | |||
[Constructor(DOMString scriptURL)] | |||
interface VidoeMonitor : EventTarget { | |||
attribute EventHandler onvideomonitor; | |||
}; | |||
interface VideoProcessor : EventTarget { | |||
attribute EventHandler onvideoprocess; | |||
}; | |||
partial interface MediaStreamTrack { | |||
void addVideoMonitor(VidoeMonitor monitor); | |||
void removeVideoMonitor(VidoeMonitor monitor); | |||
MediaStreamTrack addVideoProcessor(VidoeProcessor processor); | |||
void removeVideoProcessor(); | |||
}; | |||
[Exposed=(Window, Worker), | |||
Constructor(DOMString type, optional VideoMonitorEventInit videoMonitorEventInitDict)] | |||
interface VideoMonitorEvent : Event { | |||
readonly attribute DOMString trackId; | |||
readonly attribute double playbackTime; | |||
readonly attribute ImageBitmap? inputImageBitmap; | |||
}; | |||
[Exposed=(Window, Worker), | |||
Constructor(DOMString type, optional VideoProcessorEventInit videoProcessorEventInitDict)] | |||
interface VideoProcessEvent : VideoMonitorEvent { | |||
attribute promise<ImageBitmap> outputImageBitmap; | |||
}; | |||
</source> | |||
<br> | |||
Main thread:<br> | |||
[[File:NewProjectFoxEye1.png|1024px]]<br> | |||
Worker Thread<br> | |||
[[File:Worker - FLOW.png|1024px]]<br> | |||
===Example Code === | |||
Please check the section [http://chiahungtai.github.io/mediacapture-worker/index.html#Examples examples in MediaStreamTrack with worker]. | |||
==ImageBitmap extensions== | |||
Please see [2] for more information. | |||
==WebImage:== | |||
Why do we need WebImage? Because performance is matter to prompt computer vision to the Web. We need it for performance reason. Most image processing task can be accelerated by WebGL. But it is not the case for computer vision area. So we need a hardware accelerated-able computer vision related web API to help Web developer deliver a fast and portable Web page and application. That is the motivation of WebImage. | |||
<br> | |||
This is a new zone needed to explore. So we might start from refer to existing related API. In end of 2014, KHRONOS released a portable, power efficient vision processing API, called OpenVX. This specification might be a good start point for WebImage. | |||
<br> | |||
The following diagram is a brief concept of OpenVX. It is a graph-node architecture. The role of OpenVX for computer vision is like the role of OpenGL for graph. The developer just construct and execute the graph to process incoming image. | |||
<br> | |||
[[File:OpenVX-NodeGFX.PNG|600px]]<br> | |||
<!--[[File:Project FoxEyeWebImage1.png|800px]]<br>--> | |||
OpenVX can be implemented by OpenCL, OpenGL|ES with compute shader and C/C++ with SIMD. That means we can support OpenVX in wide range Web platform from PC to mobile once a OpenVX engine is provided. | |||
[[File:OpenVX.PNG|600px]]<br> | |||
==OfflineMediaContext:== | |||
We introduce the “OfflineMediaContext” and modify the “MediaStream” here to enable the offline (as soon as possible) MediaStream processing. When developers are going to perform the offline MediaStream processing, they need to form a context which will hold and keep the relationship of all MediaStreams that are going to be processed together. | |||
<source lang="c++"> | |||
// typedef unsigned long long DOMTimeStamp; | |||
interface OfflineMediaContext { | |||
void start(DOMTimeStamp durationToStop); | |||
attribute EventHandler onComplete; | |||
}; | |||
// Add an optional argument into the constructor. | |||
[Constructor (optional OfflineMediaContext context), | |||
Constructor (MediaStream stream, optional OfflineMediaContext context), | |||
Constructor (MediaStreamTrackSequence tracks, optional OfflineMediaContext context)] | |||
interface MediaStream : EventTarget { | |||
// No modification. | |||
... | |||
} | |||
</source> | |||
*OfflineMediaContext is the holdplace of all MediaStreams which are going to be processed together in a non-realtime rate. | |||
*OfflineMediaContext is also the object who can trigger the non-realtime processing. | |||
*OfflineMediaContext should be instantiated first and then MediaStreams, which are going to be processed together in the same context, could be instantiated by passing the pre-instantiated OfflineMediaContext object into the constructor of MediaStreams. (See the modified MediaStream constructor below) | |||
*The native implementation of OfflineMediaContext holds a non-realtime MediaStreamGraph, which is just the same as the OfflineAudioContext in WebAudio specification. | |||
*The constructors are modified by adding an optional parameter, OfflineMediaContext. By this way, developers are able to associate a MediaStream to an OfflineMediaContext. | |||
*If the optional OfflineMediaContext is given, the native implementation of the MediaStream should be hooked to the non-realtime MSG hold by the given OfflineMediaContext in stead of the global realtime MSG. | |||
*If the optional OfflineMediaContext is given, we need to check whether the new created MediaStream is able to be processed in offline rate or not. If not, the constructor should throw an error and return a NULL. (Constructors are always allowed to throw.) | |||
==OpenCV.js== | |||
*OpenCV + Emscripten = OpenCV.js | |||
*https://github.com/CJKu/opencv | |||
<!-- | |||
==Deprecated Design == | |||
*Modular Routing | |||
Modular routing allows arbitrary connections between different DIPNode(TBD) objects. Each node can have inputs and/or outputs. A source node has no inputs and a single output. A destination node has one input and no outputs. Other nodes such as filters can be placed between source and destination nodes. | |||
Here is a example for face detection work on ImageElement:<br> | |||
PS. Right now, I haven't finished the work of ScriptNode. So the draw rectangle part is skipped in sample code. | |||
<br>[[File:Project FoxEye1.png|720px]]<br> | |||
<big>Example 1:</big> | |||
<source lang="javascript"> | |||
var context = new DIPContext(); | |||
var imageElem = document.getElementById('imgelemsrc'); | |||
var source = context.createImageElementSource(imageElem); | |||
var facedetect = context.createFaceDetection(); | |||
source.connect(facedetect); | |||
var dest = context.createMediaStreamDestination(); | |||
facedetect.connect(dest); | |||
var elem = document.getElementById('videoelem'); | |||
elem.mozSrcObject = dest.stream; | |||
elem.play(); | |||
</source> | |||
Another example to show that some nodes might support callback function to pass more information rather than image. | |||
<br>[[File:Project FoxEye2.png|720px]]<br> | |||
<big>Example 2:</big> | |||
<source lang="javascript"> | |||
var context = new DIPContext(); | |||
var imageElem = document.getElementById('imgelemsrc'); | |||
var source = context.createImageElementSource(imageElem); | |||
var textdetect = context.createTextDetection(); | |||
source.connect(textdetect); | |||
var dest = context.createMediaStreamDestination(); | |||
textdetect.connect(dest); | |||
textdetect.ontextrecognized = function (e){ | |||
var text = e.recognizedText; | |||
var go2google = document.getElementById('go2Google'); | |||
go2google.href = "https://www.google.com.tw/search?q=" + text | |||
var go2IMDB = document.getElementById('go2IMDB'); | |||
go2IMDB.href = "http://www.imdb.com/find?ref_=nv_sr_fn&q=" + text | |||
var go2Amazon = document.getElementById('go2Amazon'); | |||
go2Amazon.href = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=" + text | |||
var go2TranslateEngTC = document.getElementById('go2TranslateEngTC'); | |||
go2TranslateEngTC.href = "https://translate.google.com.tw/#en/zh-TW/" + text | |||
} | |||
var elem = document.getElementById('videoelem'); | |||
elem.mozSrcObject = dest.stream; | |||
elem.play(); | |||
</source> | |||
An ideally example to combine ScriptNode with Canvas2DContext.<br> | |||
This is an example trying to do on fly camera translation like "Word Lens" and "Waygo".<br> | |||
Haven't finish the implementation for this example.<br> | |||
<br>[[File:Project FoxEye3.png|720px]]<br> | |||
<big>Example 3:</big> | |||
<source lang="javascript"> | |||
var context = new DIPContext(); | |||
var myMediaStream; | |||
navigator.getUserMedia({video:true, audio:false}, function(localMediaStream){ | |||
myMediaStream = localMediaStream; | |||
var source = context.createMediaStreamSource(myMediaStream); | |||
var textRecoginition = context.createTextRecoginition(); | |||
source.connect(textRecoginition); | |||
var textInpaint = context.createTextInpaint(); | |||
textRecoginition.connect(textInpaint); | |||
var scriptTranslate = context.createScriptProcessor(); | |||
textdetect.ontextrecognized = function (e){ | |||
var text = e.recognizedText; | |||
// Custom parameter - | |||
scriptTranslate.addParameter( "text", text ); | |||
} | |||
scriptTranslate.onimageprocess= function (e) { | |||
var text = e.parameters.text; | |||
// Translate to other language.... | |||
var newText = Translate("Eng", "TC"); | |||
var input = e.inputImage; | |||
var canvas = document.getElementsByTagName('canvas')[0]; | |||
var context2D = canvas.getContext('2d'); | |||
context2D.drawImage(input, 0, 0); | |||
context2D.strokeText(newText, 0, 0); | |||
// get an empty slate to put the data into | |||
var output = context.createImageData(canvas.width, canvas.height); | |||
e.outputImage = output; | |||
} | |||
textInpaint.connect(scriptTranslate); | |||
var dest = createMediaStreamDestination(); | |||
scriptTranslate.connect(dest); | |||
var video = document.getElementById(‘videoelem’); | |||
video.mozSrcObject = dest.stream; | |||
}, null); | |||
</source> | |||
=Demo pages= | |||
==OpenCV.js== | |||
This demo should be able to run on every existing Firefox browser on Ubuntu and Mac. Please try below website.<br> | |||
http://people.mozilla.org/~cku/opencv/ | |||
<br> | |||
[[File:OpenCVJS-1.png|1080px]] | |||
<br> | |||
[[File:OpenCVJS-2.png|1080px]] | |||
<br> | |||
==MST with Worker and ImageBitmap== | |||
You need a customized build for below demos. Please clone this repo: https://github.com/kakukogou/gecko-dev/tree/foxeye-master<br> | |||
You also need a web camera for the demos. Then build your own Firefox browser and go to below website to see the demo. | |||
<br> | |||
Source code of the demo:<br> | |||
https://github.com/kakukogou/foxeye-demo | |||
<br> | |||
Demo website:<br> | |||
http://people.mozilla.org/~tkuo/foxeye-demo/ | |||
===Monitor=== | |||
Monitor is design for just send the event to the Web Worker and no modification. The left one is from getUserMedia. The right one is using addWorkerMonitor to dispatch the input frame from the left one to a worker. The worker will detect the face and pass the face position and the input frame to main thread. Then the script in main thread use both information to draw the input frame via CanvasRenderingContext2D. | |||
*Case 1: Face detection | |||
This demo shows how to use |addWorkerMonitor| and CanvasRenderingContext2D to overlay a hat on someone's head. | |||
<br> | |||
[[File:Monitor face.png|1080px]] | |||
<br> | |||
*Case 2: QRCode | |||
[[File:Monitor qrcode.png|1080px]] | |||
<br> | |||
===Processor=== | |||
This is a demo to show how to use |addWorkerProcessor| to create a new MediaStreamTrack and show it in another HTMLVideoElement. There are 5 kind of image filter. | |||
<br> | |||
<br> | |||
[[File:Processor copy.png|1080px]] | |||
<br> | |||
[[File:Processor blur.png|1080px]] | |||
<br> | |||
[[File:Processor erode.png|1080px]] | |||
<br> | |||
[[File:Processor threshold.png|1080px]] | |||
<br> | |||
[[File:Processor invert.png|1080px]] | |||
<br> | |||
[[File:Processor gray.png|1080px]] | |||
<br> | |||
<!-- | |||
==Demo 1: Face tracker== | |||
===Browser:=== | |||
*Input comes from HTML Image Element | |||
[[File:ProjectFoxEye BrowserFaceImage.png|720px]] | |||
*Input comes from MediaStream | |||
[[File:ProjectFoxEye BrowserFaceMS.png|720px]] | |||
===B2G on Flame:=== | |||
*Input comes from HTML Image Element<br> | |||
[[File:Project FoxEyeFlameFaceImage.jpg|480px]]<br> | |||
*Input comes from MediaStream<br> | |||
[[File:Project FoxEyeFlameFaceMS.jpg|480px]]<br> | |||
==Demo 2: Text Recognition== | |||
*Browser: | |||
**Input comes from HTML Image Element | |||
[[File:ProjectFoxEye TextImage.png|720px]] | |||
**Input comes from MediaStream | |||
[[File:ProjectFoxEye TextMS.png|720px]] | |||
--> | |||
=Unlimited Potentials= | |||
<!-- | |||
According to "Firefox OS User Research Northern India Findings" [4], one of the key table-stake is camera related features. "Ways to provide photo & video editing tools" is what this WebAPI for. So if we can deliver some cool photo & video editing features, we can fulfill one of the needs of our target market.<br> | |||
In [4], it mentioned that one of purchase motivators is educate my kids. The features like PhotoMath can satisfy the education part.<br> | |||
In long term, if we can integrate text recognition with TTS(text to speech), we can help illiterate person to read words or phrase. That will be very useful features.<br> | |||
Also offline text translation in camera might be a killer application too. Waygo and WordLens is two of such applications in Android and iOS.<br> | |||
Text Selection in Image is also an interesting feature for browser. Project Naptha demos some potential functionality based on yext selection in Image.<br> | |||
--> | |||
==FoxEye technology tree== | |||
This is the technology tree of FoxEye. The solid line is the dependency. The dash line is for performance improvement. The green blocks are possible applications. The blue blocks are API draft with prototype. The red block is we know how to do but not yet starting. The yellow blocks are only the concept. Need more study. So step by step, we can provide photo manager, wide angle panorama, HDR, and even gesture control. | |||
[[File:Multimedia Platform Team Technology Roadmap - New Page.png |1024px]] | |||
==Use Cases== | |||
*Digital Image Processing(DIP) for camera: | |||
**Face In, see [https://www.youtube.com/watch?feature=player_embedded&v=PWZUCfDsFdU Sony Face In] | |||
**Augmented Reality, see [https://www.youtube.com/watch?feature=player_embedded&v=vDNzTasuYEw IKEA AR] | |||
**Camera Panorama, | |||
**Fisheye camera, | |||
**Comic Effect, | |||
**Long term, might need Android Camera HAL 3 to control camera | |||
***Smile Snapshot | |||
***Gesture Snapshot | |||
***HDR | |||
***Video Stabilization | |||
**Bar code scanner | |||
*Photo and video editing | |||
**Video Editor, see [https://www.youtube.com/watch?feature=player_embedded&v=NJ6nYgxcuUk WeVideo on Android] | |||
**A faster way for video editing tools. | |||
**Lots of existing image effects can be used for photo and video editing. | |||
**https://www.facebook.com/thanks | |||
*Object Recognition in Image(Not only FX OS, but also broswer): | |||
**Shopping Assistant, see [https://www.youtube.com/watch?feature=player_embedded&v=B7cvlWll85Q Amazon Firefly] | |||
**Face Detection/Tracking, | |||
**Face Recognition, | |||
**Text Recognition, | |||
**Text Selection in Image, | |||
***See http://projectnaptha.com/ | |||
**Text Inpainting, | |||
**Image Segmentation, | |||
**Text translation on image, see [https://www.youtube.com/watch?feature=player_embedded&v=9UalhhWBPH0 Waygo] | |||
*Duo Camera: | |||
**Nature Interaction(Gesture, Body Motion Tracking) | |||
**Interactive Foreground Extraction | |||
and so on.... | |||
==Some cool applications we can refer in real worlds== | |||
*Word Lens: | |||
**https://play.google.com/store/apps/details?id=com.questvisual.wordlens.demo | |||
**https://itunes.apple.com/tw/app/word-lens/id383463868?mt=8 | |||
*Waygo | |||
**http://www.waygoapp.com/ | |||
*PhotoMath | |||
**https://photomath.net/ | |||
*Cartoon Camera | |||
**https://play.google.com/store/apps/details?id=com.fingersoft.cartooncamera | |||
*Photo Studio | |||
**http://photo-studio.en.uptodown.com/android | |||
*Magisto | |||
**https://play.google.com/store/apps/details?id=com.magisto | |||
*Adobe PhotoShop Express | |||
**http://www.photoshop.com/products/photoshopexpress | |||
*Amazon(firefly app) | |||
**https://play.google.com/store/apps/details?id=com.amazon.mShop.android | |||
<!-- | |||
=Task Dependency= | |||
Below is the dependency graph for text recognition work. | |||
<br> | |||
[[File:Project FoxEyeTextRecgDependency.png]] | |||
<br> | |||
Below is the dependency graph for camera efficts work. | |||
<br> | |||
[[File:Project FoxEyeCameraDependency.png]] | |||
<br> | |||
Below is the dependency graph for video editor work. | |||
<br> | |||
[[File:Project FoxEyeVideoEditorDependency.png]] | |||
<br> | |||
--> | |||
<!-- | |||
=Comparison= | |||
==Canvas2DContext== | |||
Currently, you can do video effect by Canvas2DContext. See the demo made by [4]. The source code looks like below. | |||
<source> | |||
function frameConverter(video,canvas) { | |||
// Set up our frame converter | |||
this.video = video; | |||
this.viewport = canvas.getContext("2d"); | |||
this.width = canvas.width; | |||
this.height = canvas.height; | |||
// Create the frame-buffer canvas | |||
this.framebuffer = document.createElement("canvas"); | |||
this.framebuffer.width = this.width; | |||
this.framebuffer.height = this.height; | |||
this.ctx = this.framebuffer.getContext("2d"); | |||
// Default video effect is blur | |||
this.effect = JSManipulate.blur; | |||
// This variable used to pass ourself to event call-backs | |||
var self = this; | |||
// Start rendering when the video is playing | |||
this.video.addEventListener("play", function() { | |||
self.render(); | |||
}, false); | |||
// Change the image effect to be applied | |||
this.setEffect = function(effect){ | |||
if(effect in JSManipulate){ | |||
this.effect = JSManipulate[effect]; | |||
} | |||
} | |||
// Rendering call-back | |||
this.render = function() { | |||
if (this.video.paused || this.video.ended) { | |||
return; | |||
} | |||
this.renderFrame(); | |||
var self = this; | |||
// Render every 10 ms | |||
setTimeout(function () { | |||
self.render(); | |||
}, 10); | |||
}; | |||
// Compute and display the next frame | |||
this.renderFrame = function() { | |||
// Acquire a video frame from the video element | |||
this.ctx.drawImage(this.video, 0, 0, this.video.videoWidth, | |||
this.video.videoHeight,0,0,this.width, this.height); | |||
var data = this.ctx.getImageData(0, 0, this.width, this.height); | |||
// Apply image effect | |||
this.effect.filter(data,this.effect.defaultValues); | |||
// Render to viewport | |||
this.viewport.putImageData(data, 0, 0); | |||
return; | |||
}; | |||
}; | |||
// Initialization code | |||
video = document.getElementById("video"); | |||
canvas = document.getElementById("canvas"); | |||
fc = new frameConverter(video,canvas); | |||
... | |||
// Change the image effect applied to the video | |||
fc.setEffect('edge detection'); | |||
</source> | |||
Basically, the idea is use |drawImage| to acquire frame from video and draw it to canvas. Then call |getImageData| to get the data and process the image. After that, put the computed data back to the canvas and display it.<br> | |||
Compare to this approach, the proposed WebAPI has below advantages: | |||
* Not polling mechanism. | |||
** We use callback function to process all frames. | |||
==node-opencv== | |||
https://github.com/peterbraden/node-opencv | |||
"OpenCV bindings for Node.js. OpenCV is the defacto computer vision library - by interfacing with it natively in node, we get powerful real time vision in js." | |||
The sample codes looks like below: | |||
*You can use opencv to read in image files. Supported formats are in the OpenCV docs, but jpgs etc are supported. | |||
<source lang="javascript"> | |||
cv.readImage(filename, function(err, mat){ | |||
mat.convertGrayscale() | |||
mat.canny(5, 300) | |||
mat.houghLinesP() | |||
}) | |||
</source> | |||
*If however, you have a series of images, and you wish to stream them into a stream of Matrices, you can use an ImageStream. Thus: | |||
<source lang="javascript"> | |||
var s = new cv.ImageStream() | |||
s.on('data', function(matrix){ | |||
matrix.detectObject(haar_cascade_xml, opts, function(err, matches){}) | |||
}) | |||
ardrone.createPngStream().pipe(s); | |||
</source> | |||
==opencvjs== | |||
https://github.com/blittle/opencvjs<br> | |||
It is a project to compile opencv to asm.js. Might be a dead project now. | |||
==Project Naptha== | |||
"Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image." Quoted from http://projectnaptha.com/ . | |||
===How it works?=== | |||
Excerpt from Project Naptha: | |||
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model. | |||
--> | |||
<!-- | |||
=Open Source Library and Licenses = | |||
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. | |||
*Tesseract-OCR: Apache License v2.0. Tesseract is probably the most accurate open source OCR engine available. | |||
**Leptonica: Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications. | |||
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. " | |||
=What I have done= | |||
*Initialize, plan and implement this project. | |||
*Write a prototype of WebIDL for WebDIP. | |||
**MediaStream as source node and destentaion node for WebDIP. | |||
**For HTMLImageElement part as source node, there is a temporal solution for it. | |||
**Have face detection node. Can be used in MediaStream and HTMLImageElement on both browser and B2G flame. | |||
**Have text detection/recognization node. Can be used in MediaStream and HTMLImageElement on browser. | |||
--> | |||
<!-- | |||
=To Do List= | |||
*Extend MediaStreamTrack API? =>CTai | |||
**Need to integrate with Canvas2DContext and WebGL. | |||
*OfflineMediaContext study. =>Kaku(our new hire!!) | |||
*An API for image processing and object detection. =>TBD | |||
**We need such API for B2G privilege applications(or opencv-asm.js for general APPs). | |||
*How to compile OpenCV to asm.js =>Kaku, CJay | |||
**Try to figure out how to pass the ImageBitmap from VideoWroker to OpenCV-asm.js | |||
*Compare native OpenCV/Tesseract with asm.js version. =>TBD | |||
--> | |||
<!-- | |||
=Fixme List(Known Issues)= | |||
*OpenCV can't build with STLPort, only support GNUSTL. | |||
**B2G can't build with GNUSTL. | |||
*Text Detection and Recognition can't run on B2G. | |||
**Some OpenCV API use STL as arguments. The unalignment STL will cause runtime error. | |||
*Tesseract-OCR Build | |||
**Use pre-installed Tesseract-OCR now. Maybe we should support source code build of Tesseract-OCR. | |||
*Improve precision rate of text recognition. | |||
**The actual precision rate should be higher than my roughly prototype. Need improve it. | |||
*Separate OCR initialized. | |||
**Prevent redundant initialization. | |||
*Haven't done OpenCL integration in Gecko. | |||
**OpenCV has a lot of OpenCL integration. We should take advantage from it. | |||
*Canvas2DContext, WebGL can't run on worker. | |||
**Need bug 801176 and bug 709490 landed.<br> | |||
*Need ImageBitmap for VideoWorkerEvent. | |||
**Need bug 1044102 landed.<br> | |||
--> | |||
=Current Status= | |||
*Meta bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1100203 | |||
*MediaStream with worker: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1044102 | |||
*ImageBitmap: In review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=801176 | |||
*ImageBitmap extension: editor draft completed, refining the prototype for review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=1141979<br> | |||
*OpenCV.js: Working on it. See http://people.mozilla.org/~cku/opencv/ | |||
*OfflineMediaContext: Not yet started. | |||
*WebImage:Not yet started. | |||
*Run WebGL on worker: in review process. See https://bugzilla.mozilla.org/show_bug.cgi?id=709490<br> | |||
*CanvasRenderingContext2D in Worker: waiting for someone take this bug.See https://bugzilla.mozilla.org/show_bug.cgi?id=801176<br> | |||
=Next Phase(2015 H2)= | |||
*Try to land and standardize MediaStream with worker, ImageBitmap and ImageBitmap extension. see[3] for how to process standardization in Mozilla. | |||
*Design a automation way to export JavaScript API for OpenCV.js. Try to upstream it to OpenCV code base. | |||
*Start to work on OfflineMediaContext. | |||
*Support product requirement, for example, Instagram-like app, wide angle panorama or Fox photo. | |||
*Do some explanatory experiment on WebImage concept. | |||
*Initialize a sub-project called [https://wiki.mozilla.org/Project_GlovePuppetry Project GlovePuppetry] | |||
=Beyond 2015= | |||
*Proof of Concept for WebImage. | |||
*A crazy idea of Kernel.js or SPIR.js for JS developers to customize WebImage ? | |||
*Gestural control API with depth camera? => WebNI(Web Nature Interaction)? | |||
*[https://wiki.mozilla.org/Project_Cangjie Project Cangjie] | |||
=Conclusion= | |||
Step by step, we can make thing different. FoxEye is a pioneer project trying to push the Web boundary to a new area, image processing and computer vision. The key factor to ensuring this project success is you, every Web contributor. Your supports, your promotions, your feedback, your comments are the nutrition of this project. Please share this project to every Web developers. Let's bring more and more amazing things to the Web. Let's rich and improve the whole Web platform competitiveness. | |||
=References= | |||
*[1]:"WebAudio Spec", http://www.w3.org/TR/webaudio/ | |||
*[2]:"Media Capture Stream with Worker", https://w3c.github.io/mediacapture-worker/ | |||
*[3]:"Mozilla Standards", https://wiki.mozilla.org/Standards | |||
=Acknowledgements= | |||
This whole idea of adopting WebAudio as the reference design for this project was from a conversation between John Lin. Thanks for Robert O'Callahan's great feedback and comments. Thanks for John Lin and Chia-jung Hung's useful suggestions and ideas. Also, big thanks to my team members who help me to debug the code. | |||
=About Authors= | |||
==CTai== | |||
My name is [https://tw.linkedin.com/pub/chia-hung-tai/36/8b6/712 Chia-hung Tai]. I am a senior software engineer in Mozilla Taipei office. I work on Firefox OS multimedia stuffs. Before this jobs, I have some experience in OpenCL, NLP(Nature Language Processing), Data Mining and Machine Learning. My IRC nickname is ctai. You can find me in #media, #mozilla-taiwan, and #b2g channels. Also you can reach me via email(ctai at mozilla dot com). | |||
==Kaku== | |||
[https://tw.linkedin.com/in/kakukogou Tzuhuo Kuo] is an engineer in Mozilla Taipel office. | |||
==CJ Ku== | |||
[https://www.linkedin.com/pub/cj-ku/62/55b/a1b CJ Ku] is responsible for OpenCV.js part. |