Firefox/Feature Brainstorming:Archiving and Offline browsing
- Automatically (and optionally) archive the most recently-visited version of all bookmarked sites. This could be specified per site, page, or globally.
- Automatically (and optionally) archive the most recently-visited version of all pages that are annotated or have associated notes/stickies.
- Automatically (and optionally) archive only specific types of content found within a web page, such as AOL screen names found, phone numbers, or microformatted content such as hCards or hCalendar entries.
- Automatically suggest File name as a title of a page being saved in the File->Save Page As menu dialog
- Ability to automatically or manually archive contents in history in a variety of ways
- Option to store all text from all pages (possibly excluding https and preset sites) ever visited -- this would let you revisit pages that disappeared off the internet, and also do very good search through history. At 100 pages per day, and a very generous 10k per page (storing text-only, not scripts and images), this would be about a third of a gig annually, so manageable storage requirements by modern standards.
- Store the images and scripts too - I'm happy to spend $94 on a 320GB drive to dedicate to my Mozilla History - it'd save me accumulated days over a year of searching for stuff I've previously seen (coupled with the elsewhere mentioned full-text search). The above text-only suggestion could be a "Only Store Page Text" preference on a full history (and I'm sure would be useful for a subset of users). Anyway, this is going to require a retrieval engine faster than we currently have.
- Indexing & OS integration: The text of the archived pages should be indexed for quick retrieval. This index could be local to Firefox, but preferably interfaced with Vista Search and OS-X Spotlight. An index is almost mandatory in a comprehensive archival system, otherwise users are going to accumulate a huge number of web pages that they won't be able to retrieve.
- UI Control: There should be 3 options to archive a page: (1) Keep forever, (2) Keep a certain time (like six months by default), or (3) Don't archive. A small UI element in the toolbar, possibly looking like a traffic light could display the level of archival of the current page (green = forever, yellow = temporary, red = don't archive). As mentioned above, there should be a list of preset sites that the user doesn't want to archive, but once the page is loaded, the UI element allows to override the settings and archive a particular page. Similarly, the same UI element can be used to change the archival status of a web page that has been pulled out of the archive. For instance, when viewing a page that was originally stored as "Keep for 3 months", the user could click on the green light to mark the page as "Keep forever" or on the red light to remove the page from the archive.
- Pages could just be archived through the Add/Edit Bookmark interface, perhaps with an expiration date if you want to get advanced. Archived pages could just be saved in the profile directory and shown in a special places bookmark folder which is managed through the bookmarks organizer. I think the feature would be useful for saving things like web receipts, where the user wants a copy, but the save dialogs are cumbersome and produce output that is confusing (i.e. confusing file names and folder combinations).
- I wanted to archieve and browse Wikipaedia offline (I wanted to store all the pages reffered by "http://en.wikipedia.org/wiki/Collateralized_debt_obligation" ONLY WITHIN the domain "http://en.wikipedia.org/wiki/". It would be nice to have this functinality. IE provides 3 levels of page archiving, however it does not provide option to specify "within the same domain en.wikipedia.org".
- Market-share / Commercial impact: This feature is a very good way to keep your users faithful: once they have built their own local index and associated web page archives over the course of several months or years, they are not going to move to another browser. This feature could very well be a major distinction between browsers and, if implemented first in IE and integrated with the OS, it might prevent IE users from further migrating to Firefox.
- Side note: I proposed to implement this feature in May 1998 when I was working at Netscape, with the idea of solving a burgeoning problem described as "I have seen it somewhere on the net but I can't remember where". Everybody loved it but it never went into an actual development plan, and 8 years later the best we have is Google Desktop Search :-(
- Improve saving of a Website (ie: include originating URL in the saved file - as IE does)
- Make it possible to save Flash movies (content), for later playback. Either as file or for offline browsing. Preferably as a file.
- Optionally compress additional files (usually stored in an extra directory) together with the Website to save in an archive, because it sucks having a directory with the debris of a hundred of files named e.g.: "img0,0aa1781,21aa12.gif" and the like.
- See MAF (Mozilla Archive Format) which was perfect but doesn't work with newer versions of Firefox on Linux and is not improved anymore.
- See also Konqueror WAR (Web ARchive) which is a simple tar+gzip archive with index.html and other files, uncompressable even by Windows users who have an archiver.
- Functionality like Linux command 'wget -m' allowing for a certain link depth before linking back to originating web site.
- data: protocol (URI scheme) could be used to include files (esp. small ones) inside the page saved. See RFC 2397 standard.
- This requirement may be posted elsewhere, but most of my research is on the web, and I use Adobe Acrobat for primary archive/offline storage, search & access of web data. As a Firefox user, I only use IE for that purpose--to create PDF files of those pages, text & graphics (with links), that are compressed and searchable. I maintain a separate folder with appropriate subfolders for just this purpose. I can then use Adobe Acrobat to search this entire folder for relevant information, and have it displayed in context regardless of where it originates. The links to and dates of creation are also stored in the pdf. Acrobat is a ubiquitous web reader and supported by all browsers. The ability to create PDFs from web pages is the ONLY thing I use IE for. If I find a site I want to archive data from on Firefox, I copy the URL, open IE and paste the URL there to extract the data and save it on my hard drive.
- Relative linking in saved pages: when using File->Save Page As... and then selecting the "Web page, complete" option relative links are always converted to absolute links, and this is very inconvenient. Say I wanted to save a section of a website in its entirety for convenient offline browsing, and say it is housed entirely at http://www.***.***/x/y/directory_in_question/ . Saving these pages using File->Save Page As... and selecting "Web page, complete" would modify all of the relative linking to other web pages also in the section (housed entirely in directory_in_question) to absolute linking, prohibiting one from browsing conveniently offline. Therefore, it would be useful and a simple modification to allow an option in Firefox at File->Save Pages As... where one could select (perhaps in a checkbox) to "Convert relative links to absolute links" when selecting "Web page, complete", instead of having it applied every time. If the option was not selected, then addresses to other web pages would not be touched and the only modifications would be to the addresses to images and such, since they are to be preserved for offline viewing with the "Web page, complete" option. Perhaps it would also be useful to have a counter-option to select to "Convert absolute links to relative links" so to also cover the case of when web pages already contain absolute links even to pages that could be represented by relative ones (i.e. are stored at the same web site). This option would also help facilitate easier archiving and viewing offline. The way I have presented these options here is very crude, I know, but it is only to get the general idea across - obviously somebody with much more intelligence than me would need to think about it a bit deeper so that they could be elegantly implemented into the user interface when entering File->Save Page As... , even if they would be a minor detail.