Websites/Taskforce/Proposals/Abandoned Sites/Archive

From MozillaWiki
Jump to navigation Jump to search

The following steps may be followed in order to archive a Mozilla website that has been abandoned or is being retired.

Archived Mozilla websites will be made available at http://website-archive.mozilla.org

Web Dev Actions

The following actions will be performed by a member of the web development team.

Subversion

The subversion repository for the Mozilla Website Archive is available at http://svn.mozilla.org/projects/website-archive.mozilla.org

Follow the SVN svn instructions for Mozilla subversion access. Once you have access, you may checkout the website-archive repository.

svn checkout svn+ssh://svn.mozilla.org/projects/website-archive.mozilla.org

Initial Archive

The initial archive can be performed using wget. This will scrape and the entire site into html, javascript and css files. It will also save each index file with an .html extension.

 cd website-archive.mozilla.org;
 wget -rpEkH -nc --no-check-certificate
   -R *.pdf -R *.bz2 -R *.gz -R *.mov -R *.fla -R *.xml -R *.json -R *.rss
   -D mozillaservice.org http://mozillaservice.org

This method scrapes and archives most of the website. It excludes all files that we don't want to download due to space issues, such as PDF files and zipped files (this may vary on a site-by-site basis).

Privacy Actions

Once the site has been downloaded locally in its entirety, you will need to remove all code that refers to or collects user identifiable information.

Forms

Forms that request user information like email addresses and passwords will need to be removed from the codebase.

[Instructions forthcoming]

Email Addresses

All of the user identifiable information, such as email addresses, will need to be removed from the code. To locate email addresses in the code base, you may use the following egrep statement.

egrep -rn "\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}" * | grep -v svn | grep -v "mozillaservice.org" | grep -v "mozilla.org"

Resolving Redirects

All files that have been downloaded, you will need to test urls to ensure that they are redirecting properly.

[Information forthcoming]


Commit

Once this site has been downloaded and all privacy concerns have been handled, you will need to commit the site to subversion. Then you will need to file an IT request in Bugzilla to have this code pushed to production.

Systems Operations Actions

A member of the Mozilla Systems Operations team will need to perform the following actions.

Website

Backup the database for the existing website and take the website offline.

In Apache, redirect the visitor (301) accessing any page of the website to the archived page on the website-archive.mozilla.org website.

For example, a user accessing the retired website: http://mozillaservice.org/activity/stories/en_US

Should be redirected to: http://website-archive.mozilla.org/mozillaservice.org/activity/stories/en_US

Subversion

Perform an update of subversion for http://website-archive.mozilla.org