Continuous Deployment/Delivery Process

Questions and processes to be determined for each project moving on to CD.

How can we manually test new features?

Waffle testing behind flags
- Each new feature should include a waffle flag
Example from the past: SUMO. After CD there was intermittent manual testing with waffle, now none at all.

Where do we test?

AWS - creating local test instances
Test in production

== Mitigating Risk

We can highlight the risk of not doing manual testing; team must have an understanding of their level of acceptable risk.
We could provide examples of ways to mitigate the risks
There needs to be buy-in from the CD team, they must want manual testing
Sometimes, I think they might not be fully aware of risks, so we can help them realize those, and perhaps that will cause them to ask for more help (just a thought) - [stephend]

What's the minimum automation we should recommend?

A test plan will highlight the level of risk that is appropriate, can determine minimum automation level from that- need to cover high risk areas, or if we can accept failures Make sure what we have in place has a part of their build release process, otherwise failures are not highlighted

What constitutes a failure that would block deployment?

Depends on test plan and risk assessment

SUMO Continuous Deployment

SUMO is has moved onto continuous deployment. There will be a lot of changes in both the release process as well as QA for SUMO. The definition of continuous deployment is the ability to release as often as desired, without requiring the immediate participation of QA or IT during each release.

JSocol will be writing and sharing a flowchart as well as a projected timeline for the gradual change towards continuous deployment. The intention of this document will be to map out changes from the QA perspective.

Current process includes a weekly SUMO release. Each feature and bug fix includes unit test coverage. The estimated current code coverage is %96 of the total code.

QA in the current process includes feature and bug fix verification. It includes running automation in both staging and production environments- before and after releases. It also includes manual regression testing, usually on the level of smoke tests- unless it's a new feature, where more attention and exploratory testing is called for. Contributor testing is also engaged during test days, and for testing of new features.

Changes

Moving forward there will be a number of significant changes in how QA is done within SUMO.

1. Manual testing:

Manual testing will be done only for exploratory testing and new feature verification.
Manual testing of bugs, features and exploratory testing will be done by QA.
Manual testing may be done in production, instead of in staging.
Manual testing of new features will be done by community members.

2. Automated testing:

A Selenium automated test suite will be created including approximately 5 tests. Tests will focus on areas that cannot be covered in unit tests. Example: Python cannot test CSRF functionality because the python suite disables it.
Selenium tests will only be for use cases which cannot be covered by Python or qunit.
This test suite will be run for each code check-in, and will only be run in staging or the developer environment. Each test will qualify as a deployment blocker if it fails.
Tests will cover critical areas such as: AAQ, registration, edit KB article and translation
All other automated tests will be deleted or removed.
In production automated tests will be run that verify the environment and services are all working. We currently do have a services page, it is being filled out over time. Services cannot be tested in staging because search, celery, etc. rely on an external sources.
We could have more tests run in production, although only ones that do not create or destroy content. It is important not to create unnecessary data, and to avoid sending out notifications. Features are fully tested after each check-in.

3. Releases

Will go out as many times as there are fixes ready to go. They will be released by developers, without IT or QA. The developers will be in charge of monitoring services after the release to watch for changes in behavior.

4. Bug fixes:

Do not need verification prior to releases. They may be verified by a developer, or by QA in production.

5. New features:

Do not need verification prior to release. They may be released to production under flags. Flags are already used in production

Risks & Plans & Tools

Risk: Less critical tests will not be covered with automation.
- If the feature breaks it won't cause as much of a problem. It is deemed acceptable to to wait an hour for a fix. Examples include ability to answer a question.

Risk: More bugs with less automation, why not automate more rather than less?
- Time. Creating more tests in Selenium is guaranteeing with unit tests have already verified, and it slows the process down. Right now there is a lot of duplicate effort. The team will discuss what is not currently covered by unit cases.

Risk: One of the major risks with continuous deployment is the change to no manual QA for each release. The risk of releasing with bugs from new features or regressions is increased.
- New feature testing is already being implemented with flags, and there should be no new issues regarding the ability to release features to designated users.
- Community involvement will be a key component to alerting the team to any issues which appear in production. We are currently developing plans to encourage user SUMO feedback. Examples include adding a feedback button on each page, similar to Get Satisfaction. There is a plan to bring in UX people to discuss. This will provide an active feedback system to complement the passive monitoring tool feedback.
- There is a general increased risk with less manual testing. The compensation for that is more data measurement, faster fixes, and more time for QA to work in other areas.
- Code reviews are already mandatory for developers. Smaller releases make issues easier to determine and easier to resolve. There will be mistakes but everyone will learn quickly from them.

Risk: Cannot test notifications
- We can guarantee a notification is sent, not that it is ever received. Currently there is no answer to this problem, but it is being worked on.

Risk: How do you trust use of feature flags in production?
- We already use Waffle for Django in production. This tool is easy and flexible. It is specific to individual permissions or features. There is always a potential risk that a new permission or feature flag will touch an existing feature in an unexpected way, but up until this point that has not happened and is always taken into consideration.

Risk: Releases may have hidden bugs or regressions which affect users.
- Using Graphite as a tool. This tool will monitor usage of many common SUMO functions. It is already in production and is collecting user data. It is currently monitored closely during releases. Any significant data spikes or dips after a release will quickly indicate if there is an issue.
- Using StatsD as a tool. Another tool to monitor site usage by users, which may indicate issues if they exist.

Risk: Quality is going down, how do we change direction?
- We are already releasing more quickly, which allows for fixes to go out faster. If we get to a point where we are not satisfied with the quality we will weigh that against the benefits of the speed of fixes. Ultimately we do not want to go back to a slower release cycle. If continuous deployment is not working well enough, it is possible to scale back to releasing 5x or 3x a week.

Risk: Quality will go down over time as QA is less involved with the daily release process.
- There is no endpoint for looking at quality, goal is to give the best product as fast as possible. Monitoring releases will provide quality data statistics to refer to.

Risk: Continuous deployment will evolve to no longer need IT during releases. The risk is that deployment may not go well and IT may not be available to assist with a fix.
- The worst case scenario is deploying incorrectly. A lot of work is being done to automate the deployment process with IT. This can happen now, but currently IT is present during each release. In the future we will no longer be able to roll back releases. Patches can be added, or fixes- but only by adding code not by undoing a database change. Can redeploy within an hour, which is comparable to what exists now. This change will happen after IT is confident in the new process.

Risk: We don't have a current baseline for quality.
- We are gathering data all the time with new services. It is really difficult to measure the current level of regression and new bugs which are sent out in releases. The entire team is keeping an eye on quality in order to keep the levels up. In order to maintain current levels of quality we will keep an eye on regressions and feedback.

Assessment

Continuous deployment will require a lot of fundamental changes in how SUMO is released. This process is a long range goal that has already started to be implemented. The pieces that have been implemented have all gone really well. Ultimately this is a path which does have more risks, but the risks are outweighed by the benefits of continuous deployment.

Going forward, measuring quality can no longer be determined purely by numbers of bugs. Two bugs in a release with less than five fixes will give a much higher regression number compared to current releases. Of course if there are more regressions or patches needed, that is one signifier. Measuring the time between bugs filed and bugs resolved/released will become a better measure of quality. Continuous deployment will allow fixes to be released faster than our current process.

We will need to trust the measurements of quality. Right now the new tools to measure and guarantee quality are already in production. They are being improved upon all the time. Data is already being collected and measured. The new tools will only expand on the depth of what can be measured.

The benefits of continuous deployment are numerous. Bugs can be fixed for everyone as soon as the developer fixes them, rather than waiting for the next release date. Response to issues will be faster. Code will be tested at production scale. There will no longer be a dependence on QA or IT to be present for each release. The roles of everyone in the SUMO team will change with this process, but it seems like a positive change.