PDF.js: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 37: Line 37:


Backend
Backend
* Need to parse/fetch/build pageDict ...
** Type 1 fonts
*** translate T1 into CFF, code already exists in fontforge (jfkthame thinks this is possible, with some effort)
*** fixes broken ligatures?
* flate
** xref streams
* linearization
** byte range requests
* zooming
* zooming
** the general idea is that the UI will set a zoom factor, say 200%
** the general idea is that the UI will set a zoom factor, say 200%
** we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left  
** we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left  
* draw subpage
* draw subpage
* SVG backend
* linearization
** byte range requests
* hyperlinks (hash URLs, intra-doc links)
* hyperlinks (hash URLs, intra-doc links)
* perf (use workers for some stuff?)
* perf (use workers for some stuff?)
* images
** decode image streams
** apply soft masks
* color spaces (big, pervasive)
* color spaces (big, pervasive)
* build something like gecko's display list, for hit testing
* build something like gecko's display list, for hit testing
** click-on-link (easy)
** click-on-link (easy)
** text selection (hard)
** text selection (hard)
* SVG backend
* page transitions


UI
UI
* zooming
* pan/zoom gestures
* animations (page flip, etc.)
* animations (page flip, etc.)
* hyperlinks
* hyperlinks
* page transitions
* page transitions
* dual-page display
* page-transition animations
* <s>pan/zoom/next/prev gestures</s> (Edit: Felipe Gomes and cjones discussed a better way to support these, but it will require new web APIs)


Platform
Platform
* dashed stroking
* ctx.getTransform
* TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
* TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
* (determine extent of SVG text-selection implementation, if any)
* implement text selection in SVG documents
* (determine extent of SVG a11y implementation, if any)
* (determine extent of SVG a11y implementation, if any)


Line 97: Line 86:
==== Big project: SVG backend ====
==== Big project: SVG backend ====


Most of SVG maps well to PDF (was influenced by?).  There are existing PDF->SVG translators.  Perf is the biggest concern.
Most of SVG maps well to PDF (was influenced by?).  There are existing PDF->SVG translators.  Perf is the biggest concern.  We want to build the SVG document in the background, without affecting main-thread interactivity.  The way to do that is by building the document with a Web Worker thread.  The problem is, Workers don't have access to any DOM APIs.  We'll probably need to build the document as a string in the background, then send it over to the main thread for parsing.


==== Big project: Text selection ====
==== Big project: Text selection ====

Revision as of 06:06, 19 June 2011

PDF.js is an HTML5-based Portable Document Format renderer.

Project Manager: Pascal Finette

Developers: Andreas Gal (part-time), Chris Jones (part-time), Vivien Nicolas (part-time), Shaon Barman

Repository: https://github.com/andreasgal/pdf.js

IRC: #pdfjs on irc.mozilla.org

Milestone: Big-splash demo

Probably will be of pixel-perfect rendering of tracemonkey paper, with nontrivial UI (i.e. eye candy).

Pixel-perfect rendering

  • Type1 fonts
  • Bitmaps and SMask blending
  • canvas.setDash()
  • even-odd fills
  • axial shading
  • TTF fonts (pass the sanitizer)

Non-trivial UI

  • zooming
  • pre-rendering pages
  • "continuous" scrolling

Milestone: PDF.js Firefox extension 1.0

PDF.js/Planning/1.0

Minimum Feature Set

Schedule

(TODO)

Backend

  • zooming
    • the general idea is that the UI will set a zoom factor, say 200%
    • we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left
  • draw subpage
  • SVG backend
  • linearization
    • byte range requests
  • hyperlinks (hash URLs, intra-doc links)
  • perf (use workers for some stuff?)
  • color spaces (big, pervasive)
  • build something like gecko's display list, for hit testing
    • click-on-link (easy)
    • text selection (hard)

UI

  • animations (page flip, etc.)
  • hyperlinks
  • page transitions
  • dual-page display
  • page-transition animations
  • pan/zoom/next/prev gestures (Edit: Felipe Gomes and cjones discussed a better way to support these, but it will require new web APIs)

Platform

  • TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
  • implement text selection in SVG documents
  • (determine extent of SVG a11y implementation, if any)

Testing

  • reftest-style harness, compare hand-written PDF commands to hand-written canvas (?)
  • compare to poppler output, keep list of differences

Analysis

  • dump stream info
  • dump font info
  • dump raster image info

Big project: Color spaces

Approach: map input color values (fillcolor, strokecolor etc.) to output color space. Map input bitmaps to output space with SVG color-matrix filter/WebGL shader program/hand-written JS as available. Problem: will this work correctly for interpolated color values, like intermediate colors in a gradient, and other computed values like the result of composition operators? Does canvas need color-space support? Do we care enough? (What do other PDF renderers do?)

Big project: Hyperlinks

  • Parse link data from PDF
  • Add UI to highlight/set cursor on link hover
  • Implement "go to point X in page Y" interface in backend
  • Figure out encoding scheme for absolute links, e.g. http://foo.com/bar.pdf#[encoded link]

Big project: SVG backend

Most of SVG maps well to PDF (was influenced by?). There are existing PDF->SVG translators. Perf is the biggest concern. We want to build the SVG document in the background, without affecting main-thread interactivity. The way to do that is by building the document with a Web Worker thread. The problem is, Workers don't have access to any DOM APIs. We'll probably need to build the document as a string in the background, then send it over to the main thread for parsing.

Big project: Text selection

  • Option 1: In SVG backend
    • Draw to canvas first. On first selection, switch to SVG-rendered content.
    • Let Gecko do all text selection in SVG document
  • Option 2: In canvas backend
    • Build data structure representing text drawn to screen (e.g., display list/BSP/etc.). For best results, collapse adjacent and same-height/width "text runs".
    • Walk data structure and compute textruns at a particular point and/or within a bounding box
    • Add UI for "highlighted" text above PDF and saving selected text to clipboard
      • Corner cases: clipped text, occluded text, non-white backgrounds, non-black text
    • Maybe: render without display-list building first, then on first selection re-interpret PDF to build display list. Or pre-build display list in the background.

Big project: Accessibility

Kind of like text selection, except there's no web-visible accessibility API we could hook with canvas. So

  • Somehow detect that a11y is enabled, permanently switch to SVG backend
  • Let Gecko implement a11y interfaces

(Possibly) Big project: Vertical text

Somewhat pervasive mode switch in text-drawing code. Is it just a matter of transform hackery to put glyphs in the right place, or do we need canvas support? Canvas support might be a big project.

Utils

To uncompress a PDF

Coding Style

  • add a "use strict"; statement (exactly that!) to the top of your JS files
  • 2 spaces for indentation. (sbarman: it seems like its 4 currently in pdf.js) (cjones: we're going to fix pdf.js after Type1 fonts merge)
  • Line break are free (I promise) don't hesitate to use them to separate logical block inside your functions.
  • Adding a toString method to an object to print informations about this particular object to the console is helpful when debugging.
  • Be sure to declare a variable with 'var' before using it you don't want to be hurt by random variables living on the global scope.
  • Files are named like_this.js.

Useful resources:

Also some particular points (sentence stolen from https://developer.mozilla.org/en/JavaScript_style_guide)

  • Don't use object methods and properties more than you have to. It is often faster to store the result in a temporary variable.

If you have to do DOM manipulations (hopefully not!):

  • Don't call getAttribute to see if an attribute exists, call hasAttribute instead.
  • Prefer to loop through childNodes rather than using first/lastChild with next/previousSibling. But prefer hasChildNodes() to childNodes.length > 0. Similarly prefer document.getElementsByTagName(aTag).item(0) != null to document.getElementsByTagName(aTag).length > 0.

Review (aka pull-request) policy

NBB: this isn't being enforced yet

  • New code has to pass all tests (FORTHCOMING)
  • New code can't regress performance on (TBD) as measured by (TBD). Unless the new code implements a new feature major enough to suffer a temporary perf regression. This is up to common sense.
  • Major new features should have architectural review from (TBD). Less major patches can be reviewed by (TBD).