Confirmed users
770
edits
| Line 473: | Line 473: | ||
The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model. | The primary feature of Project Naptha is actually the text detection, rather than optical character recognition. The author write a text detection algorithm called Stroke Width Transform, invented by Microsoft Research in 2008, which is capable of identifying regions of text in a language-agnostic manner in WebWorker. Once a user begins to select some text, it scrambles to run character recognition algorithms in order to determine what exactly is being selected. The default OCR engine is a built-in pure-javascript port of the open source Ocrad OCR engine. There’s the option of sending the selected region over to a cloud based text recognition service powered by Tesseract, Google’s (formerly HP’s) award-winning open-source OCR engine which supports dozens of languages, and uses an advanced language model. | ||
<!-- | |||
=Open Source Library and Licenses = | =Open Source Library and Licenses = | ||
*OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. | *OpenCV: OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. | ||
| Line 480: | Line 481: | ||
<br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. " | <br>"The Leptonica copyright is less restrictive still. We use the BSD 2-clause license, which is similar to the Apache license but much briefer. Kirk McKusick playfully called the BSD license a copycenter, as differentiated from the usual copyright and the GPL copyleft: "Take it down to the copy center and make as many copies as you want." The BSD restrictions can be approximately summarized as: (1) Don't pretend that you wrote this, and (2) Don't sue us if it doesn't work.<br> For Leptonica, as with Apache and BSD, modifications of the source code that are used in commercial products can be made without any obligation to make them available in open source. (This does happen voluntarily -- it's likely that the majority of issues noted by people working on commercial products are reported back, often with patches.) The Leptonica license only requires that any use of the source, whether in original or modified form, must include the Leptonica copyright notice, and that modified versions must be clearly marked as such. " | ||
=What I have done= | =What I have done= | ||
*Initialize, plan and implement this project. | *Initialize, plan and implement this project. | ||