QA/Execution/Web Testing/Release Checklist
From MozillaWiki
< QA | Execution | Web Testing
(rough draft)
New Sites
- Has the site been staged on a sanctioned IT server?
- Was there a sufficient amount of/the right kind of data to test fixes/features?
- Has QA verified as many bugs as possible?
Staging Sites / Pre-release
- For the remaining Resolved FIXED bugs, have the project owners signed off on the potential risks/unknowns?
- Has security reviewed/tested the app/site/new features?
- Was any load or performance-testing needed/done?
- Are there SQL/data-migration scripts to be run? Have they been run on staging?
- Are staging and production using the same load-balancers/caching infrastructure (Zeus/Netscaler)?
- Do the MIME types match between staging and production?
- Code-wise (repositories), will/do staging and production match with the latest revision #s? i.e. svn status/info/whatever Github does (rebase master?)
- Is the push on IT's radar, with a set push time/outage window (if needed), and are all involved parties aware? (Hopefully all are by this step!)
- Is all of the above pertinent info in the push bug, with a chronological sequence of steps to follow (installation of packages, enabling of services, etc.)?
Push / Release
- Do the slave/master and data-replication architectures/setup match?
- Is Nagios/heartbeat monitoring on staging/prod?
- Is Puppet/configuration management enabled/working correctly?
- Do the system libraries/packages match up version-wise, between staging and prod, if a pre-existing site?
- (Right branch/tag?)
- Is there a current and verified backup of all relevant (user/app) data, in case of a rollback?
- Is there a read-only mode, and does it need to be enabled during the push?
- If so, does there need to be a user-facing message indicating so?
- If there's no read-only mode, is there an outage page set up in advance, ready to go?
- Are all redirects properly set up/synched?
- Are all neccessary cron jobs set up and running?
- Are all services running/modules enabled (i.e. SMTP, Celery, Memcached, Redis)?
- Is error logging (e.g. tracebacks, etc.) on all relevant services (Apache, app, database, etc.) enabled (with the right permissions?) and logging correctly?
- Are any other releases/upgrades/outages planned, which might impact this one?
- If the push is exceeding the downtime window, have the right people/aliases been contacted, with a reasonable ETA for recovery/fix?
- Is there a documented contingency plan?
- Who makes the call to roll back?
Notes / Next Steps
- Split the tasks / checks up by teams
- Talk to Mike Alexis about improving the project initiation form
- Development / QA comes up with a checklist for each project (assigns owners)
- Implement what we have in AMO / Socorro for other projects
- Pre-release meeting, toss out the steps the project doesn't need