User:Archaeopteryx/Concept:Personal web

Today the internet is an essential part of work and life, people have personal browsing habits and some things what they see and read they want to remember later or give to someone else. These pages define very good what the user wants, so they should be used for customization. Furthermore, addresses aren't available for infinity, i. e.

subscribed or temporary available content
broken or no redirects after site redesign/restructure
content or site removed

Scrapbook and Scrapbook+ (the latter has performance improvements and was created because the author of the former one didn't respond to the latter one) are first concepts for storing pages offline, Scrapbook won also a Firefox Contest in the past. But the code hasn't improved for a while and performance is pretty bad for large files. Furthermore, it lacks a proper bookmarks integration.

User interface

Sidebar

Different color and underlining for captured bookmarks
Opening with Alt + Klick? or rightclick (and there default opening behavior for different clicks)
if more than one capture: show as folder-bookmark hybrid (extensible folder)

Address bar

first capture: middleclick
deleting: rightclick menu
indicating captured page by icon (bookmark star with a book or page in background): Alt + arrow right to switch to latest captured version

Search

Address bar
Sidebar
Integrate into Library
Hooks for desktop search engines
Allow scripts to set metadata from files, i. e. author, date, title, description etc.

Updating

At least hooks for automated
Update all in folder if you user desires

Processing

Input filter/manipulation

HTTPS pages should be excluded by default
HTML5 specifies content and non-content parts of pages, the latter ones not be indexed
Often, only a part of the content of a web page is interesting to the user, i. e. the meaining part is navigation, advertising or unrelated stuff. People often want to access the main content, so a rule based capturing of a part or only parts of web pages would make sense. With HTML 5, content can be classified, but the user can get the best results if they generate the filters themselves. I. e., Xpath support, JavaScript manipulation. If possible, the original page structure should be stored to allow most post-capture processing.
Non-visible (= hidden) content (nodes) should not be stored (many sites do this for print pages).
All content which doesn't get used by output filters should be stored into one archive (.jar) per page (tiny files cause a large overhead because of the disk sector size, i. e. NTFS with 4096 bytes)

Output filter/manipulation

Certain pages or part of the pages should be accessible from normal file browsers, i. e. media files the user wants to play in an external media player, comic strip images, PDF files or html content related to a topic and which needs to be document external (i. e. for legal reasons). Allowing to process exported files by an external program (i. e. media converter) would be a nice-to-have.

Post-capture processing

Applying customizations after the capturing is a fundamental part of an extensible data scheme because customization scripts will be written with the use of the storage system. Recapturing the page and applying the script probably won't work always because of only temporary available or IP-restricted content.

User:Archaeopteryx/Concept:Personal web

Contents

Goals

User interface

Sidebar

Address bar

Search

Updating

Processing

Input filter/manipulation

Output filter/manipulation

Post-capture processing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools