The following steps may be followed in order to archive a Mozilla website that has been abandoned or is being retired.

Archived Mozilla websites will be made available at http://website-archive.mozilla.org

Web Dev Actions

The following actions will be performed by a member of the web development team.

Subversion

The subversion repository for the Mozilla Website Archive is available at http://svn.mozilla.org/projects/website-archive.mozilla.org

Follow the svn instructions for Mozilla subversion access. Once you have access, you may checkout the website-archive repository.

 svn checkout svn+ssh://svn.mozilla.org/projects/website-archive.mozilla.org

Initial Archive

The initial archive can be performed using wget. This will scrape and the entire site into html, javascript and css files. It will also save each index file with an .html extension.

 cd website-archive.mozilla.org;
 wget -rpEkH -nc --no-check-certificate
   -R *.pdf -R *.bz2 -R *.gz -R *.mov -R *.fla -R *.xml -R *.json -R *.rss
   -D mozillaservice.org http://mozillaservice.org

This method scrapes and archives most of the website. It excludes all files that we don't want to download due to space issues, such as PDF files and zipped files (this may vary on a site-by-site basis).

For a site that is approximately 1,200 pages in size, this process took 1 minute 30 seconds and downloaded 22MB of data. If you're concerned about server usage for this particular site, you can use --wait=n and --random-wait to be less aggressive towards the server.

Privacy Actions

Once the site has been downloaded locally in its entirety, you will need to remove all code that refers to or collects user identifiable information.

Forms

Forms that request user information like email addresses and passwords will need to be removed from the codebase. Currently, we are handling this process manually.

[Instructions forthcoming]

If you would like to help automate this process, feel free to document that process below.

Email Addresses

All of the user identifiable information, such as email addresses, will need to be removed from the code. To locate email addresses in the code base, you may use the following egrep statement.

 egrep -rn "\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}" * | grep -v svn | grep -v "mozilla.org"

Resolving Redirects

All files that have been downloaded, you will need to test urls to ensure that they are redirecting properly.

[Instructions forthcoming]

Commit

Once this site has been downloaded and all privacy concerns have been handled, you will need to commit the site to subversion. Then you will need to file an IT request in Bugzilla to have this code pushed to production.

Archived Header

Finally, you will need to add a note to the top of each page making the user aware that this is an archived website.

[Instructions forthcoming]

Systems Operations Actions

A member of the Mozilla Systems Operations team will need to perform the following actions.

Website

Backup the database for the existing website and take the website offline.

In Apache, redirect the visitor (301) accessing any page of the website to the archived page on the website-archive.mozilla.org website.

For example, a user accessing the retired website: http://mozillaservice.org/activity/stories/en_US

Should be redirected to: http://website-archive.mozilla.org/mozillaservice.org/activity/stories/en_US

Subversion

Perform an update of subversion for http://website-archive.mozilla.org

Websites/Taskforce/Proposals/Abandoned Sites/Archive

Contents

Web Dev Actions

Subversion

Initial Archive

Privacy Actions

Forms

Email Addresses

Resolving Redirects

Commit

Archived Header

Systems Operations Actions

Website

Subversion

Navigation menu

Websites/Taskforce/Proposals/Abandoned Sites/Archive

Web Dev Actions

Subversion

Initial Archive

Privacy Actions

Forms

Email Addresses

Resolving Redirects

Commit

Archived Header

Systems Operations Actions

Website

Subversion

Navigation menu

Search