Firefox/Feature Brainstorming:Archiving and Offline browsing
Save/Archive sites
- Ability to automatically or manually archive contents in history in a variety of ways
- Option to store all text from all pages (possibly excluding https and preset sites) ever visited -- this would let you revisit pages that disappeared off the internet, and also do very good search through history. At 100 pages per day, and a very generous 10k per page (storing text-only, not scripts and images), this would be about a third of a gig annually, so manageable storage requirements by modern standards.
- Store the images and scripts too - I'm happy to spend $94 on a 320GB drive to dedicate to my Mozilla History - it'd save me accumulated days over a year of searching for stuff I've previously seen (coupled with the elsewhere mentioned full-text search). The above text-only suggestion could be a "Only Store Page Text" preference on a full history (and I'm sure would be useful for a subset of users). Anyway, this is going to require a retrieval engine faster than we currently have.
- Improve saving of a Website (ie: include originating URL in the saved file - as IE does)
- Optionally compress additional files (usually stored in an extra directory) together with the Website to save in an archive, because it sucks having a directory with the debris of a hundred of files named e.g.: "img0,0aa1781,21aa12.gif" and the like.
See MAF (Mozilla Archive Format) which was perfect but doesn't work with newer versions of Firefox on Linux and is not improved anymore.
- Functionality like Linux command 'wget -m' allowing for a certain link depth before linking back to originating web site.
- data: protocol (URI scheme) could be used to include files (esp. small ones) inside the page saved. See RFC 2397 standard.
This requirement may be posted elsewhere, but most of my research is on the web, and I use Adobe Acrobat for primary archive/offline storage, search & access of web data. As a Firefox user, I only use IE for that purpose--to create PDF files of those pages, text & graphics (with links), that are compressed and searchable. I maintain a separate folder with appropriate subfolders for just this purpose. I can then use Adobe Acrobat to search this entire folder for relevant information, and have it displayed in context regardless of where it originates. The links to and dates of creation are also stored in the pdf. Acrobat is a ubiquitous web reader and supported by all browsers. The ability to create PDFs from web pages is the ONLY thing I use IE for. If I find a site I want to archive data from on Firefox, I copy the URL, open IE and paste the URL there to extract the data and save it on my hard drive.