PDF.js/Meeting2011-10-13

From MozillaWiki
Jump to: navigation, search

- Adil will be in SF next week

   Will give a talk about pdf.js at Unicode conference in SF
   Background: got Mac fonts working, worked on Arabic font support

- Brendan

   Initial corpus report
   Looks like poppler is messing positioning (ours is good, aligns with Preview)
   Using "perception diff" for comparing images with different rendering backends
   Will present result on show n tell

- Julian

   Responded to Web Print API: https://bugzilla.mozilla.org/show_bug.cgi?id=691140
   Assuming Web Workers land, what to do next?
       Make sure Workers are enabled
       Optimizing the code using multiple Workers
       Or doing more work on the SVG backend    
   @Artur 
       Q: What's the motivation for SVG?
       A: Printing, text selection
       https://github.com/jviereck/pdf.js/tree/svg
       serve svg/index.html        
   How to handle major pull request?
       @Artur:
           Feels like there's complexity added, not yet performance gain
           Would prefer trying something simple first (1x worker), if main goal is to free up UI
           Most desktop CPUs have only 2 cores; more than 1 worker might slow things down due to context switching and message-passing overhead
       @Brendan:
           Agrees that we need *some* worker patch soon, as some docs freeze the UI
       Consensus:
           Wait for Chris's opinion!

-Artur

   ran over patch of julian to give feedback about worker stuff
   looking at different PDFs to firgure out how they break up words
   basic reasearch for text selection
   @Julian:
       try Crocodoc on how they do the output
       output the IR queue to inspect it
   -> did the reseach for Crocodoc
       they don't break words, but join it into one span
   inspector is great, but maybe really just got with IR dump
   @Julian
       keep it simple for now
       iterate/make it awesome later on
   
   concern: search!
   
   @Brendan: 
       does search work for svg?
       how about long documents
   @Julian:
       svg search might work, but still have to be able to search for not yet renedered pages
       search should based on pdf data, not on rendering output
   @adil:
       also think about: text positioning might be anywhere -> no real order
       alrabic font rendering -> determ text?
   use canvas rendering at first for text before using HTML5 maybe

- Yury

  @Artur, search might not that bad. We  just need get text without space and line breaks, Preview and Reader are  doing search by ignoring the spaces. The worker can scan of the pages and collect only text content;
  
  https://github.com/notmasteryet/pdf.js/tree/text-1 has text selection prototype (html5 text span overlay), it selects and copies the text almost right, search might work on those as well but only for loaded pages;
  
  concern: we receiving PDFs as the issues that are broken, how we handle those requests
  
  @Julian there is also https://bugzilla.mozilla.org/show_bug.cgi?id=680617