Firefox/Feature Brainstorming:Archiving and Offline browsing

« Firefox/Feature Brainstorming

Autoarchiving

Automatically (and optionally) archive the most recently-visited version of all bookmarked sites. This could be specified per site, page, or globally.
Automatically (and optionally) archive the most recently-visited version of all pages that are annotated or have associated notes/stickies.
Automatically (and optionally) archive only specific types of content found within a web page, such as AOL screen names found, phone numbers, or microformatted content such as hCards or hCalendar entries.

Save/Archive sites

Automatically suggest File name as a title of a page being saved in the File->Save Page As menu dialog
Ability to automatically or manually archive contents in history in a variety of ways
- Option to store all text from all pages (possibly excluding https and preset sites) ever visited -- this would let you revisit pages that disappeared off the internet, and also do very good search through history. At 100 pages per day, and a very generous 10k per page (storing text-only, not scripts and images), this would be about a third of a gig annually, so manageable storage requirements by modern standards.
- Store the images and scripts too - I'm happy to spend $94 on a 320GB drive to dedicate to my Mozilla History - it'd save me accumulated days over a year of searching for stuff I've previously seen (coupled with the elsewhere mentioned full-text search). The above text-only suggestion could be a "Only Store Page Text" preference on a full history (and I'm sure would be useful for a subset of users). Anyway, this is going to require a retrieval engine faster than we currently have.
- Indexing & OS integration: The text of the archived pages should be indexed for quick retrieval. This index could be local to Firefox, but preferably interfaced with Vista Search and OS-X Spotlight. An index is almost mandatory in a comprehensive archival system, otherwise users are going to accumulate a huge number of web pages that they won't be able to retrieve.
- UI Control: There should be 3 options to archive a page: (1) Keep forever, (2) Keep a certain time (like six months by default), or (3) Don't archive. A small UI element in the toolbar, possibly looking like a traffic light could display the level of archival of the current page (green = forever, yellow = temporary, red = don't archive). As mentioned above, there should be a list of preset sites that the user doesn't want to archive, but once the page is loaded, the UI element allows to override the settings and archive a particular page. Similarly, the same UI element can be used to change the archival status of a web page that has been pulled out of the archive. For instance, when viewing a page that was originally stored as "Keep for 3 months", the user could click on the green light to mark the page as "Keep forever" or on the red light to remove the page from the archive.
  - Pages could just be archived through the Add/Edit Bookmark interface, perhaps with an expiration date if you want to get advanced. Archived pages could just be saved in the profile directory and shown in a special places bookmark folder which is managed through the bookmarks organizer. I think the feature would be useful for saving things like web receipts, where the user wants a copy, but the save dialogs are cumbersome and produce output that is confusing (i.e. confusing file names and folder combinations).
  - I wanted to archieve and browse Wikipaedia offline (I wanted to store all the pages reffered by "http://en.wikipedia.org/wiki/Collateralized_debt_obligation" ONLY WITHIN the domain "http://en.wikipedia.org/wiki/". It would be nice to have this functinality. IE provides 3 levels of page archiving, however it does not provide option to specify "within the same domain en.wikipedia.org".
- Market-share / Commercial impact: This feature is a very good way to keep your users faithful: once they have built their own local index and associated web page archives over the course of several months or years, they are not going to move to another browser. This feature could very well be a major distinction between browsers and, if implemented first in IE and integrated with the OS, it might prevent IE users from further migrating to Firefox.
- Side note: I proposed to implement this feature in May 1998 when I was working at Netscape, with the idea of solving a burgeoning problem described as "I have seen it somewhere on the net but I can't remember where". Everybody loved it but it never went into an actual development plan, and 8 years later the best we have is Google Desktop Search :-(

Improve saving of a Website (ie: include originating URL in the saved file - as IE does)
Make it possible to save Flash movies (content), for later playback. Either as file or for offline browsing. Preferably as a file.
Optionally compress additional files (usually stored in an extra directory) together with the Website to save in an archive, because it sucks having a directory with the debris of a hundred of files named e.g.: "img0,0aa1781,21aa12.gif" and the like.
See MAF (Mozilla Archive Format) which was perfect but doesn't work with newer versions of Firefox on Linux and is not improved anymore.
See also Konqueror WAR (Web ARchive) which is a simple tar+gzip archive with index.html and other files, uncompressable even by Windows users who have an archiver.[1]
Functionality like Linux command 'wget -m' allowing for a certain link depth before linking back to originating web site.
data: protocol (URI scheme) could be used to include files (esp. small ones) inside the page saved. See RFC 2397 standard.

This requirement may be posted elsewhere, but most of my research is on the web, and I use Adobe Acrobat for primary archive/offline storage, search & access of web data. As a Firefox user, I only use IE for that purpose--to create PDF files of those pages, text & graphics (with links), that are compressed and searchable. I maintain a separate folder with appropriate subfolders for just this purpose. I can then use Adobe Acrobat to search this entire folder for relevant information, and have it displayed in context regardless of where it originates. The links to and dates of creation are also stored in the pdf. Acrobat is a ubiquitous web reader and supported by all browsers. The ability to create PDFs from web pages is the ONLY thing I use IE for. If I find a site I want to archive data from on Firefox, I copy the URL, open IE and paste the URL there to extract the data and save it on my hard drive.