PDF.js: Difference between revisions

Revision as of 06:06, 19 June 2011

PDF.js is an HTML5-based Portable Document Format renderer.

Project Manager: Pascal Finette

Developers: Andreas Gal (part-time), Chris Jones (part-time), Vivien Nicolas (part-time), Shaon Barman

Repository: https://github.com/andreasgal/pdf.js

IRC: #pdfjs on irc.mozilla.org

Milestone: Big-splash demo

Probably will be of pixel-perfect rendering of tracemonkey paper, with nontrivial UI (i.e. eye candy).

Pixel-perfect rendering

~~Type1 fonts~~
~~Bitmaps and SMask blending~~
~~canvas.setDash()~~
~~even-odd fills~~
~~axial shading~~
TTF fonts (pass the sanitizer)

Non-trivial UI

~~zooming~~
~~pre-rendering pages~~
~~"continuous" scrolling~~

Milestone: PDF.js Firefox extension 1.0

PDF.js/Planning/1.0

Minimum Feature Set

Schedule

(TODO)

Backend

zooming
- the general idea is that the UI will set a zoom factor, say 200%
- we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left
draw subpage
SVG backend
linearization
- byte range requests
hyperlinks (hash URLs, intra-doc links)
perf (use workers for some stuff?)
color spaces (big, pervasive)
build something like gecko's display list, for hit testing
- click-on-link (easy)
- text selection (hard)

UI

animations (page flip, etc.)
hyperlinks
page transitions
dual-page display
page-transition animations
~~pan/zoom/next/prev gestures~~ (Edit: Felipe Gomes and cjones discussed a better way to support these, but it will require new web APIs)

Platform

TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
implement text selection in SVG documents
(determine extent of SVG a11y implementation, if any)

Testing

reftest-style harness, compare hand-written PDF commands to hand-written canvas (?)
compare to poppler output, keep list of differences

Analysis

dump stream info
dump font info
dump raster image info

Big project: Color spaces

Approach: map input color values (fillcolor, strokecolor etc.) to output color space. Map input bitmaps to output space with SVG color-matrix filter/WebGL shader program/hand-written JS as available. Problem: will this work correctly for interpolated color values, like intermediate colors in a gradient, and other computed values like the result of composition operators? Does canvas need color-space support? Do we care enough? (What do other PDF renderers do?)

Big project: Hyperlinks

Parse link data from PDF
Add UI to highlight/set cursor on link hover
Implement "go to point X in page Y" interface in backend
Figure out encoding scheme for absolute links, e.g. http://foo.com/bar.pdf#[encoded link]

Big project: SVG backend

Most of SVG maps well to PDF (was influenced by?). There are existing PDF->SVG translators. Perf is the biggest concern. We want to build the SVG document in the background, without affecting main-thread interactivity. The way to do that is by building the document with a Web Worker thread. The problem is, Workers don't have access to any DOM APIs. We'll probably need to build the document as a string in the background, then send it over to the main thread for parsing.

Big project: Text selection

Option 1: In SVG backend
- Draw to canvas first. On first selection, switch to SVG-rendered content.
- Let Gecko do all text selection in SVG document
Option 2: In canvas backend
- Build data structure representing text drawn to screen (e.g., display list/BSP/etc.). For best results, collapse adjacent and same-height/width "text runs".
- Walk data structure and compute textruns at a particular point and/or within a bounding box
- Add UI for "highlighted" text above PDF and saving selected text to clipboard
  - Corner cases: clipped text, occluded text, non-white backgrounds, non-black text
- Maybe: render without display-list building first, then on first selection re-interpret PDF to build display list. Or pre-build display list in the background.

Big project: Accessibility

Kind of like text selection, except there's no web-visible accessibility API we could hook with canvas. So

Somehow detect that a11y is enabled, permanently switch to SVG backend
Let Gecko implement a11y interfaces

(Possibly) Big project: Vertical text

Somewhat pervasive mode switch in text-drawing code. Is it just a matter of transform hackery to put glyphs in the right place, or do we need canvas support? Canvas support might be a big project.

Utils

To uncompress a PDF

install pdftk (http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/)
run |pdftk foo.pdf output uncompressed.foo.pdf uncompress|

Coding Style

add a "use strict"; statement (exactly that!) to the top of your JS files

2 spaces for indentation. (sbarman: it seems like its 4 currently in pdf.js) (cjones: we're going to fix pdf.js after Type1 fonts merge)

Line break are free (I promise) don't hesitate to use them to separate logical block inside your functions.

Adding a toString method to an object to print informations about this particular object to the console is helpful when debugging.

Be sure to declare a variable with 'var' before using it you don't want to be hurt by random variables living on the global scope.

Files are named like_this.js.

Useful resources:

Also some particular points (sentence stolen from https://developer.mozilla.org/en/JavaScript_style_guide)

Don't use object methods and properties more than you have to. It is often faster to store the result in a temporary variable.

If you have to do DOM manipulations (hopefully not!):

Don't call getAttribute to see if an attribute exists, call hasAttribute instead.
Prefer to loop through childNodes rather than using first/lastChild with next/previousSibling. But prefer hasChildNodes() to childNodes.length > 0. Similarly prefer document.getElementsByTagName(aTag).item(0) != null to document.getElementsByTagName(aTag).length > 0.

Review (aka pull-request) policy

NBB: this isn't being enforced yet

New code has to pass all tests (FORTHCOMING)
New code can't regress performance on (TBD) as measured by (TBD). Unless the new code implements a new feature major enough to suffer a temporary perf regression. This is up to common sense.
Major new features should have architectural review from (TBD). Less major patches can be reviewed by (TBD).

@@ Line 37: / Line 37: @@
 Backend
-* Need to parse/fetch/build pageDict ...
-** Type 1 fonts
-*** translate T1 into CFF, code already exists in fontforge (jfkthame thinks this is possible, with some effort)
-*** fixes broken ligatures?
-* flate
-** xref streams
-* linearization
-** byte range requests
 * zooming
 ** the general idea is that the UI will set a zoom factor, say 200%
 ** we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left
 * draw subpage
+* SVG backend
+* linearization
+** byte range requests
 * hyperlinks (hash URLs, intra-doc links)
 * perf (use workers for some stuff?)
-* images
-** decode image streams
-** apply soft masks
 * color spaces (big, pervasive)
 * build something like gecko's display list, for hit testing
 ** click-on-link (easy)
 ** text selection (hard)
-* SVG backend
-* page transitions
 UI
-* zooming
-* pan/zoom gestures
 * animations (page flip, etc.)
 * hyperlinks
 * page transitions
+* dual-page display
+* page-transition animations
+* <s>pan/zoom/next/prev gestures</s> (Edit: Felipe Gomes and cjones discussed a better way to support these, but it will require new web APIs)
 Platform
-* dashed stroking
-* ctx.getTransform
 * TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
-* (determine extent of SVG text-selection implementation, if any)
+* implement text selection in SVG documents
 * (determine extent of SVG a11y implementation, if any)
@@ Line 97: / Line 86: @@
 ==== Big project: SVG backend ====
-Most of SVG maps well to PDF (was influenced by?).  There are existing PDF->SVG translators.  Perf is the biggest concern.
+Most of SVG maps well to PDF (was influenced by?).  There are existing PDF->SVG translators.  Perf is the biggest concern.  We want to build the SVG document in the background, without affecting main-thread interactivity.  The way to do that is by building the document with a Web Worker thread.  The problem is, Workers don't have access to any DOM APIs.  We'll probably need to build the document as a string in the background, then send it over to the main thread for parsing.
 ==== Big project: Text selection ====

PDF.js: Difference between revisions

Revision as of 06:06, 19 June 2011

Contents

Milestone: Big-splash demo

Milestone: PDF.js Firefox extension 1.0

Minimum Feature Set

Schedule

(TODO)

Big project: Color spaces

Big project: Hyperlinks

Big project: SVG backend

Big project: Text selection

Big project: Accessibility

(Possibly) Big project: Vertical text

Utils

Coding Style

Review (aka pull-request) policy

Navigation menu

PDF.js: Difference between revisions

Revision as of 06:06, 19 June 2011

Milestone: Big-splash demo

Milestone: PDF.js Firefox extension 1.0

Minimum Feature Set

Schedule

(TODO)

Big project: Color spaces

Big project: Hyperlinks

Big project: SVG backend

Big project: Text selection

Big project: Accessibility

(Possibly) Big project: Vertical text

Utils

Coding Style

Review (aka pull-request) policy

Navigation menu

Search