PDF.js/Meeting2011-10-13
From MozillaWiki
< PDF.js
- Adil will be in SF next week
Will give a talk about pdf.js at Unicode conference in SF Background: got Mac fonts working, worked on Arabic font support
- Brendan
Initial corpus report Looks like poppler is messing positioning (ours is good, aligns with Preview) Using "perception diff" for comparing images with different rendering backends Will present result on show n tell
- Julian
Responded to Web Print API: https://bugzilla.mozilla.org/show_bug.cgi?id=691140 Assuming Web Workers land, what to do next? Make sure Workers are enabled Optimizing the code using multiple Workers Or doing more work on the SVG backend @Artur Q: What's the motivation for SVG? A: Printing, text selection https://github.com/jviereck/pdf.js/tree/svg serve svg/index.html How to handle major pull request? @Artur: Feels like there's complexity added, not yet performance gain Would prefer trying something simple first (1x worker), if main goal is to free up UI Most desktop CPUs have only 2 cores; more than 1 worker might slow things down due to context switching and message-passing overhead @Brendan: Agrees that we need *some* worker patch soon, as some docs freeze the UI Consensus: Wait for Chris's opinion!
-Artur
ran over patch of julian to give feedback about worker stuff looking at different PDFs to firgure out how they break up words basic reasearch for text selection @Julian: try Crocodoc on how they do the output output the IR queue to inspect it -> did the reseach for Crocodoc they don't break words, but join it into one span inspector is great, but maybe really just got with IR dump @Julian keep it simple for now iterate/make it awesome later on concern: search! @Brendan: does search work for svg? how about long documents
@Julian: svg search might work, but still have to be able to search for not yet renedered pages search should based on pdf data, not on rendering output
@adil: also think about: text positioning might be anywhere -> no real order alrabic font rendering -> determ text?
use canvas rendering at first for text before using HTML5 maybe
- Yury
@Artur, search might not that bad. We just need get text without space and line breaks, Preview and Reader are doing search by ignoring the spaces. The worker can scan of the pages and collect only text content; https://github.com/notmasteryet/pdf.js/tree/text-1 has text selection prototype (html5 text span overlay), it selects and copies the text almost right, search might work on those as well but only for loaded pages; concern: we receiving PDFs as the issues that are broken, how we handle those requests @Julian there is also https://bugzilla.mozilla.org/show_bug.cgi?id=680617