PDF.js: Difference between revisions
(→Utils) |
|||
| Line 112: | Line 112: | ||
=== Utils === | === Utils === | ||
To run tests | |||
* Read [https://github.com/andreasgal/pdf.js/commit/f7aa14977849488616ed3d3890198b75034bba94 this] | |||
* Do what it says | |||
* Then, | |||
pdf.js$ python test.py | |||
To uncompress a PDF | To uncompress a PDF | ||
Revision as of 06:23, 19 June 2011
PDF.js is an HTML5-based Portable Document Format renderer.
Project Manager: Pascal Finette
Developers: Andreas Gal (part-time), Chris Jones (part-time), Vivien Nicolas (part-time), Shaon Barman
Repository: https://github.com/andreasgal/pdf.js
IRC: #pdfjs on irc.mozilla.org
Milestone: Big-splash demo
Probably will be of pixel-perfect rendering of tracemonkey paper, with nontrivial UI (i.e. eye candy).
Pixel-perfect rendering
Type1 fontsBitmaps and SMask blendingcanvas.setDash()even-odd fillsaxial shading- TTF fonts (pass the sanitizer)
Non-trivial UI
zoomingpre-rendering pages"continuous" scrolling
Milestone: PDF.js Firefox extension 1.0
Minimum Feature Set
Schedule
(TODO)
Backend
- zooming
- the general idea is that the UI will set a zoom factor, say 200%
- we'll redraw the canvas, but with a scale transform to 2x, and a translation set to move the content we want to fill the screen to the top-left
- draw subpage
- SVG backend
- linearization
- byte range requests
- hyperlinks (hash URLs, intra-doc links)
- perf (use workers for some stuff?)
- color spaces (big, pervasive)
- build something like gecko's display list, for hit testing
- click-on-link (easy)
- text selection (hard)
UI
- animations (page flip, etc.)
- hyperlinks
- page transitions
- dual-page display
- page-transition animations
pan/zoom/next/prev gestures(Edit: Felipe Gomes and cjones discussed a better way to support these, but it will require new web APIs)
Platform
- TextMetrics.maxHeight (to compute more accurate bounding boxes; can approximate without this, though)
- implement text selection in SVG documents
- (determine extent of SVG a11y implementation, if any)
Testing
- reftest-style harness, compare hand-written PDF commands to hand-written canvas (?)
- compare to poppler output, keep list of differences
Analysis
- dump stream info
- dump font info
- dump raster image info
Big project: Color spaces
Approach: map input color values (fillcolor, strokecolor etc.) to output color space. Map input bitmaps to output space with SVG color-matrix filter/WebGL shader program/hand-written JS as available. Problem: will this work correctly for interpolated color values, like intermediate colors in a gradient, and other computed values like the result of composition operators? Does canvas need color-space support? Do we care enough? (What do other PDF renderers do?)
Big project: Hyperlinks
- Parse link data from PDF
- Add UI to highlight/set cursor on link hover
- Implement "go to point X in page Y" interface in backend
- Figure out encoding scheme for absolute links, e.g. http://foo.com/bar.pdf#[encoded link]
Big project: SVG backend
Most of SVG maps well to PDF (was influenced by?). There are existing PDF->SVG translators. Perf is the biggest concern. We want to build the SVG document in the background, without affecting main-thread interactivity. The way to do that is by building the document with a Web Worker thread. The problem is, Workers don't have access to any DOM APIs. We'll probably need to build the document as a string in the background, then send it over to the main thread for parsing.
Big project: Text selection
- Option 1: In SVG backend
- Draw to canvas first. On first selection, switch to SVG-rendered content.
- Let Gecko do all text selection in SVG document
- Option 2: In canvas backend
- Build data structure representing text drawn to screen (e.g., display list/BSP/etc.). For best results, collapse adjacent and same-height/width "text runs".
- Walk data structure and compute textruns at a particular point and/or within a bounding box
- Add UI for "highlighted" text above PDF and saving selected text to clipboard
- Corner cases: clipped text, occluded text, non-white backgrounds, non-black text
- Maybe: render without display-list building first, then on first selection re-interpret PDF to build display list. Or pre-build display list in the background.
Big project: Accessibility
Kind of like text selection, except there's no web-visible accessibility API we could hook with canvas. So
- Somehow detect that a11y is enabled, permanently switch to SVG backend
- Let Gecko implement a11y interfaces
(Possibly) Big project: Vertical text
Somewhat pervasive mode switch in text-drawing code. Is it just a matter of transform hackery to put glyphs in the right place, or do we need canvas support? Canvas support might be a big project.
Utils
To run tests
- Read this
- Do what it says
- Then,
pdf.js$ python test.py
To uncompress a PDF
- install pdftk (http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/)
- run |pdftk foo.pdf output uncompressed.foo.pdf uncompress|
Coding Style
- add a
"use strict";statement (exactly that!) to the top of your JS files
- 2 spaces for indentation. (sbarman: it seems like its 4 currently in pdf.js) (cjones: we're going to fix pdf.js after Type1 fonts merge)
- Line break are free (I promise) don't hesitate to use them to separate logical block inside your functions.
- Be sure to declare a variable with 'var' before using it you don't want to be hurt by random variables living on the global scope.
- Files are named
like_this.js.
Useful resources:
- https://developer.mozilla.org/En/Developer_Guide/Coding_Style#General_Practices
- https://developer.mozilla.org/En/Developer_Guide/Coding_Style#JavaScript_Practices
- https://developer.mozilla.org/En/Developer_Guide/Coding_Style#Naming_and_Formatting_code
Also some particular points (sentence stolen from https://developer.mozilla.org/en/JavaScript_style_guide)
- Don't use object methods and properties more than you have to. It is often faster to store the result in a temporary variable.
If you have to do DOM manipulations (hopefully not!):
- Don't call getAttribute to see if an attribute exists, call hasAttribute instead.
- Prefer to loop through childNodes rather than using first/lastChild with next/previousSibling. But prefer hasChildNodes() to childNodes.length > 0. Similarly prefer document.getElementsByTagName(aTag).item(0) != null to document.getElementsByTagName(aTag).length > 0.
Review (aka pull-request) policy
NBB: this isn't being enforced yet
- New code has to pass all tests (FORTHCOMING)
- New code can't regress performance on (TBD) as measured by (TBD). Unless the new code implements a new feature major enough to suffer a temporary perf regression. This is up to common sense.
- Major new features should have architectural review from (TBD). Less major patches can be reviewed by (TBD).