User:Rhelmer:Verification proposals

From MozillaWiki
Jump to: navigation, search

Overall build verification process

Tag

  • check tag logs for errors with "grep -v '^T'"
    • anything returned from above grep is an error

Build

  • verify build logs (manually scan tinderbox output)
  • tinderbox does an Alive test (ensures that browser starts up and can stay running for 45s)

Repack

Update

Stage

  • verify that *only* locales listed in shipped-locales are present
    • actually checks that other deliverables like XPI, EXE etc. are present

Sign

  • verify that there is a detached signature (.asc) for every file
  • verify that every authenticode signature on EXE files shows "Mozilla" as publisher

Release

  • manual spot checks

Update verification

Really this tests many things:

  • update snippets
  • bouncer
  • aus
  • that the patches apply
  • that an updated build is the same as a release build

and their interactions. For example, the complete test checks that all configured aus paths return valid update.xml files, which in turn point to valid partial and complete MAR files. The MAR files must match the checksum and filesize advertised in the update.xml

Each MAR file must apply correctly to the target build, and result of this update must be exactly the same as a released build (e.g. windows installer.exe).

Problems

  • complete takes several hours to run
  • code is somewhat inflexible (bash scripts)
    • the intent was to be as simple as possible, but the problem space is more complex than original assumptions

Benefits

In order to address these specific problems, I've made some modifications such as different modes for the verification script:

1) only download update.xml 2) only test that mars exist 3) only test that mars match checksum/size 4) complete test

Note that each test builds upon the previous, which allows ad-hoc scripting to test other aspects of the system.

For example, if I have run in mode #2 (test-only), which does an HTTP HEAD to figure out the size of the remote file, for both the releasetest and release channels, then it can be shown that both are the same by grepping out the "Content-Length" strings from each file, and diffing the result set.

Todo

Ad-hoc scripting is incredibly useful and could be used to show correctness in other areas, such as:

  • all update.xml files are consistent across aus2-staging and the web heads
  • for each locale, for each os, all full update patches apply and are identical.
  • for each release < n - 1 (where n is the current release), the previous rule is true for both partial and complete patches

I think it would be much easier to do this kind of thing if we rewrote the main logic of this script, using a language with tools that could really parse the update.xml and allow us to use more appropriate data structures

Update verification lives at: [1]

l10n verification

There are two parts to the l10n verification, a per-release "diff" and a "meta-diff". The per-release "diff" can be described as:

  • download and unpack en-US build for release x
  • download and unpack each locale build for release x
  • for each locale, unpack build and recursively diff the package contents against en-US

If there are any binary differences, this test fails, as l10n builds should be repacks of the core en-US build.

Comparing the results of two different releases results in the "meta-diff", which will show changes in localizations between two releases.

l10n lives at [2]

Proposed changes

  • compare the final signed, ready-to-ship bits with the candidate builds that QA tested. This can be done in a similar way to the l10n per-release "diff" process.
  • compare AUS and bouncer configuration on both production and staging
  • l10n metadiff and full update verification really should run on the platform of origin and not on the same Mac we use for repacks and all verification (for a variety of reasons, from known false negatives to the resource problem this causes when doing multiple releases).
  • be able to kick off verification runs from available pool of servers, instead of having dedicated machines. These could be the pool of available build machines, or equivalent.
  • all tests must be capable of returning a definitive PASS or FAIL; human verification should not be neccessary, but still be possible (all actions should be logged)
  • we should add some verification that the releasenotes, and any in-product pages are online (either live, staging, or both, depending on the changes)