PDF.js
PDF.js is an HTML5-based Portable Document Format renderer.
Project Manager: Pascal Finette
Developers: Andreas Gal (part-time), Chris Jones (part-time), Vivien Nicolas (part-time), Shaon Barman
Repository: https://github.com/andreasgal/pdf.js
IRC: #pdfjs on irc.mozilla.org
Milestone: Big-splash demo
Probably will be of pixel-perfect rendering of tracemonkey paper, with nontrivial UI (i.e. eye candy).
Pixel-perfect rendering
- Type1 fonts
Bitmaps and SMask blendingcanvas.setDash()even-odd fillsaxial shading
Non-trivial UI
- zooming
- gestures (pinch, next/prev etc.). (Would only work on win7/android, and would need different impls for both.)
- pre-rendering pages
- "continuous" scrolling
- dual-page display
- page-transition animations?
Milestone: PDF.js Firefox extension 1.0
Minimum Feature Set
Schedule
(TODO)
Backend
- Need to parse/fetch/build pageDict ...
- Type 1 fonts
- translate T1 into CFF, code already exists in fontforge (jfkthame thinks this is possible, with some effort)
- fixes broken ligatures?
- Type 1 fonts
- flate
- xref streams
- linearization
- byte range requests
- zooming
- the general idea is that the UI will set a zoom factor, say 200%
- we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left
- draw subpage
- hyperlinks (hash URLs, intra-doc links)
- perf (use workers for some stuff?)
- images
- decode image streams
- apply soft masks
- color spaces (big, pervasive)
- build something like gecko's display list, for hit testing
- click-on-link (easy)
- text selection (hard)
- SVG backend
- page transitions
UI
- zooming
- pan/zoom gestures
- animations (page flip, etc.)
- hyperlinks
- page transitions
Platform
- dashed stroking
- ctx.getTransform
- TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
- (determine extent of SVG text-selection implementation, if any)
- (determine extent of SVG a11y implementation, if any)
Testing
- reftest-style harness, compare hand-written PDF commands to hand-written canvas (?)
- compare to poppler output, keep list of differences
Analysis
- dump stream info
- dump font info
- dump raster image info
Big project: Color spaces
Approach: map input color values (fillcolor, strokecolor etc.) to output color space. Map input bitmaps to output space with SVG color-matrix filter/WebGL shader program/hand-written JS as available. Problem: will this work correctly for interpolated color values, like intermediate colors in a gradient, and other computed values like the result of composition operators? Does canvas need color-space support? Do we care enough? (What do other PDF renderers do?)
Big project: Hyperlinks
- Parse link data from PDF
- Add UI to highlight/set cursor on link hover
- Implement "go to point X in page Y" interface in backend
- Figure out encoding scheme for absolute links, e.g. http://foo.com/bar.pdf#[encoded link]
Big project: SVG backend
Most of SVG maps well to PDF (was influenced by?). There are existing PDF->SVG translators. Perf is the biggest concern.
Big project: Text selection
- Option 1: In SVG backend
- Draw to canvas first. On first selection, switch to SVG-rendered content.
- Let Gecko do all text selection in SVG document
- Option 2: In canvas backend
- Build data structure representing text drawn to screen (e.g., display list/BSP/etc.). For best results, collapse adjacent and same-height/width "text runs".
- Walk data structure and compute textruns at a particular point and/or within a bounding box
- Add UI for "highlighted" text above PDF and saving selected text to clipboard
- Corner cases: clipped text, occluded text, non-white backgrounds, non-black text
- Maybe: render without display-list building first, then on first selection re-interpret PDF to build display list. Or pre-build display list in the background.
Big project: Accessibility
Kind of like text selection, except there's no web-visible accessibility API we could hook with canvas. So
- Somehow detect that a11y is enabled, permanently switch to SVG backend
- Let Gecko implement a11y interfaces
(Possibly) Big project: Vertical text
Somewhat pervasive mode switch in text-drawing code. Is it just a matter of transform hackery to put glyphs in the right place, or do we need canvas support? Canvas support might be a big project.
Utils
To uncompress a PDF
- install pdftk (http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/)
- run |pdftk foo.pdf output uncompressed.foo.pdf uncompress|
Coding Style
- set javascript.options.strict to true in about:config
- 2 spaces for indentation. (sbarman: it seems like its 4 currently in pdf.js) (cjones: we're going to fix pdf.js after Type1 fonts merge)
- Line break are free (I promise) don't hesitate to use them to separate logical block inside your functions.
- Adding a toString method to an object to print informations about this particular object to the console is helpful when debugging.
- Be sure to declare a variable with 'var' before using it you don't want to be hurt by random variables living on the global scope.
- Files are named
like_this.js.
Useful resources:
- https://developer.mozilla.org/En/Developer_Guide/Coding_Style#General_Practices
- https://developer.mozilla.org/En/Developer_Guide/Coding_Style#JavaScript_Practices
- https://developer.mozilla.org/En/Developer_Guide/Coding_Style#Naming_and_Formatting_code
Also some particular points (sentence stolen from https://developer.mozilla.org/en/JavaScript_style_guide)
- Don't use object methods and properties more than you have to. It is often faster to store the result in a temporary variable.
If you have to do DOM manipulations (hopefully not!):
- Don't call getAttribute to see if an attribute exists, call hasAttribute instead.
- Prefer to loop through childNodes rather than using first/lastChild with next/previousSibling. But prefer hasChildNodes() to childNodes.length > 0. Similarly prefer document.getElementsByTagName(aTag).item(0) != null to document.getElementsByTagName(aTag).length > 0.
Review (aka pull-request) policy
NBB: this isn't being enforced yet
- New code has to pass all tests (FORTHCOMING)
- New code can't regress performance on (TBD) as measured by (TBD). Unless the new code implements a new feature major enough to suffer a temporary perf regression. This is up to common sense.
- Major new features should have architectural review from (TBD). Less major patches can be reviewed by (TBD).