CloudServices/Loop/Deploy: Difference between revisions

← Older edit

CloudServices/Loop/Deploy (view source)

Revision as of 21:52, 26 January 2016

6,037 bytes added , 26 January 2016

→‎Dev: add setting info

Ianb

Confirmed users

176

edits

@@ Line 25: / Line 25: @@
 ** Bob Michelleto <bobm@mozilla.com>
 * <b>QA</b>
-** James Bonacci <jbonacci@mozilla.com>
+** Richard Pappalardo <rpappalardo@mozilla.com> (Primary)
+** Karl Thiessen <kthiessen@mozilla.com> (Backup)
-=Deployement=
+=Deployment=
 There are three deployed environments. A fourth will be deployed later.
@@ Line 34: / Line 35: @@
 ==Dev==
-* <b>Host:</b> https://loop-dev.stage.mozaws.net/
+* <b>Host:</b> https://loop-dev.stage.mozaws.net
 * <b>Maintainer:</b> DEVs
 * <b>Tokbox mocked?</b> NO.
 * <b>Usage:</b> Development and integration
+* <b>Updates:</b>
+** loop-client is updated automatically every hour and should match the latest master of [https://github.com/mozilla/loop-client the repository]
+** loop-server is updated with the master branch by devs on a regular basis - or upon request. you can get the version by displaying the root URL of the server.
+* <b>Access:</b>  ec2-user@loop.dev.mozaws.net
-This environment is updated with the master branch by devs on a regular basis - or upon request. you can get the version by displaying the root URL of the server.
+To use this in your browser go to about:config and edit <code>loop.server</code> changing the value to <code>https://loop-dev.stage.mozaws.net</code> and then restart your browser.
 '''This environment can be used to test end-to-end scenario until the service hits the Stable channel.'''
@@ Line 48: / Line 53: @@
 * <b>Maintainer:</b> OPS
 * <b>Tokbox mocked?</b> Yes (but can change, check the / endpoint for more info)
-* <b>Usage:</b> Server-side QA and Loadtesting
+* <b>Usage:</b> Server-side QA and Loadtesting with the mock server
@@ Line 57: / Line 62: @@
 ==Real-Stage==
-* <b>Host:</b> To be defined
+* <b>Host:</b> https://loop.stage.mozaws.net/
 * <b>Maintainer:</b> OPS
 * <b>Mocked tokbox?:</b> No
-* <b>Usage:</b> Client-side QA
+* <b>Usage:</b> Client-side QA, Server-side QA and Loadtesting with a live third-party/partner server
 Not yet commissioned. This environment will be used for end-to-end testing of the service once it hits the stable channel.
 '''This server will be a perfect mirror of the production environment, updated with the tag of the upcoming release'''
+* (jabonacci) Correct me if I am wrong but we already have this. We are using a configurable Stage environment that can either point to a mock server (previous section) or point to a live server. So, we are able to do end-to-end testing. Host name is the same. Configuration is defined in a file on the server:
+** /data/loop-server/config/settings.json
 ==Production==
@@ Line 82: / Line 90: @@
 =Releasing loop-client=
+Please see [[Loop/Loop-client_Release_Process]] for the deployment and release process details.
-Releasing loop-client is done by extracting from mozilla-central a sub-tree of files and a shared directory
-Process (draft) https://webrtc.etherpad.mozilla.org/26?
 =Release Cycle=
@@ Line 127: / Line 132: @@
 # Backport (cherrypick) the commit in the 0.9.x branch (create it if needed);
 # Tag a new minor release: 0.9.1 and fill a new deployment request.
+= Deployment Versioning =
+Loop-server is backward compatible and uses version routes (e.g. /v1, /v2) to segment changes in API.  Currently this is controlled by the Loop-server and does not offer the ability to update v2 route code in isolation from v1.  In other words, Fx34 browsers using /v1 route will have a production code change due to a bug fix in Fx35 /v2 API.   Here's a breakdown of different approaches:
+'''Option A: /v1 and /v2 routes point to different server clusters:'''
+* Pros:
+** Fx34 and /v1 users are not affected by any code change to /v2 api users.
+** Low risk of injecting bugs
+* Cons:
+** Ops has to maintain two server clusters (dev/stage/prod)
+** Both servers need to use the same database... this gets tricky.
+** Mo' computers mo' problems (complexity).
+'''Option B: /v1 and /v2 routes are hosted on a single server:'''
+* Pros:
+** Simpler, less resources used and needed
+** Easier to use a single database
+* Cons
+** More risk when pushing code for /v2, /v1 code will be affected.
+For Fx 34/35, we are choosing Option B. till we reach a point where have a history of injecting bugs due to this architecture.
+= Deploying flow =
+See full version at: https://old.etherpad-mozilla.org/deploy-release-process
+== How does a release get to production? ==
+* QA/DEV creates a stage deployment ticket and adds dependencies and blockers
+** (e.g. "Loop — Please deploy loop-server 0.13.0 to Stage)
+* DEV make a tag
+** Here we should try to make sure that the changelog has all Resolved/Fixed bugs going into this release.
+* OPS deploys build to stage
+* QA validates the fact that the release get deployed by OPS to stage
+* OPS set the stage bug to fixed as soon as it is deployed, yes this is fine
+* QA runs verification steps, quick tests, and loadtests
+** (after having set a window with partners)
+* QA set the bug to verified as soon as on as it is ok to deploy to Production
+* QA creates a deployment bug for production and add dependencies and blockers
+* OPS deploy the release to production and sets the bug to Resolved/Fixed
+* QA set the bug to verified as soon as deployment has been verified.
+** (This may include  verification by the Loop client and QA teams)
+* OPS should be monitoring the release for a specific period of time
+** (to watch out for unforeseen issues and side-effects)
+== What do we do in the following cases? ==
+* A bug is found during the stage validation
+** DEV fix the issue and make a new minor release 0.13.1 and create a new deployment request bug (e.g. "Loop — Please deploy loop-server 0.13.1 to Stage)
+*** (do we morph the existing one? jbonacci says no last time I did). So we close the current deployment bug and create a new one. ok.
+** OPS fix the issue and make a new minor release or a re-release of same build. We have had circumstances where the change is OPS-specific, not dev specific.
+** QA close the previous stage ticket as invalid and the story restarts with the new bug
+*** I am pondering this idea for minor vs. major releases. One the one hand, having a history in the ticket (12.0, 12.1, 12.2) is good. On the other hand, the ticket can get to large (see Loop-Server 12.2)...
+* A bug is found in production
+** DEV fix the issue and make a new minor release from the production release (e.g 0.12.3)
+** DEV creates a stage bug (e.g. "Loop — Please deploy loop-server 0.12.3 to Stage). Well, QA should create the Stage ticket with information gathered from Dev. But either way works for me...
+*** Then same story as an usual release
+== Who gives the green light when prod is ready to be updated? ==
+For instance, lately we had a bug in production that happened while stage validation was passed by QA.
+In this case, it's a bit tricky to know if we should deploy to prodution ot not.
+In order to avoid things going wrong, should we wait for QA to give the green light again before pushing something new to production?
+Consider this can be blocking the resolution of a problem.
+* As soon as the stage ticket has been verified and that the production bug is created
+** Then OPS have a QA green light and can start the deployment.
+** Right. And issues specific to Production are a special case anyway. If tests pass in Stage but something goes run in Production, then we need to add the fix to both. If there is a Production-specific issue (that we would never see in Stage), then we should approach it on a case-by-case basis. There are cases where we have had to push something special/specific/urgent/break fix for other Production environments. It's not something we should consider "normal procedure" though, because it requires Stage and Prod to be out of sync.
+* There is the idea of a code change that always needs to go through this process. This is DEV driven.
+* Then, there is the idea of a service-level change that always needs to go through this process. This should be OPS driven.
+* Then, sometimes we have a real emergency in Production that requires a change (DEV or OPS). We have not always been good about the process for this case.
+<b>Examples:</b>
+# The service is broken and needs a code change
+# Server issues like stack size, cpu/memory/disk issues, config issues, DB issues