The Top of the List

Capacity

Short Term

Analyze VMs - *can* we run tests on AWS -speaking to some people no! AWS can be slow for Windows (20 min startup sometimes)
Get more machines (releng) (iX machines)
- Windows and linux test slave replacements
- Are we going to use these for both talos and unittests? Follow up with Armenzg
- 1044 minis in production
Identify problematic machines - reflash them to fix them. Can we auto-detect dead machines (reduces intermittency)
For VMs, we would probably want a one-off script to compare between vms and desktop env's.
Monitor sdcard burnout on mobile

Long Term

For branch to branch comparisons of intermittency we want this in orange factor
Tools to monitor failure trends on mobile devices w.r.t. burnout, dead sdcard etc.

Turn around time

Short Term

Develop a short cycle test - would need to be scrupulously maintained
- Smoke test to start/stop the browser - adding more tests to that doesn't really seem to add any other stuff.
- If windows builds fail, but Linux, OSX, Android are fine - would we not schedule full suite of tests on all platforms because one platform is dead?
  - On try you still want all the results. Whereas on inbound, then you don't want to waste the cycles on something that will be backed out. Sheriff could retrigger if for some reason we still want those tests
Low hanging fruit in some harnesses - like parallelizing xpcshell on multi-core
- xpcshell could be improved by having fast disks as well (for tests that use sqllite data - using ramdisk for tmp/ file) could we do ramdisk on windows?
Be smarter about when we run various tests
- Run tests off of where they were checked in - don't run robocop if mobile doesn't change etc.
- don't run js tests if you're not actually changing js stuff
- don't run desktop tests if you only change b2g
- Run JS tests as part of make check - could pull these out and optimize them

Long Term

makefile optimizations - mozbuild files (build faster)
make the tests faster?
- only run a portion of tests and at some interval run all (but need bisect in the cloud)
  - run the longer tests periodically
  - run the high frequency orange tests periodically
  - parallelize tests more between machines
  - run the tests that have never failed in the last year periodically (i.e. dom-levelX)
  - choose statistically a sampling of tests for each changeset
What is the feasibility of audit of tests and measure what is in there and what is duplicated etc. (test measurement)
- even more awesome would be some sort of test ownership that emerges from this audit

Reproducibility

Short Term

Envrionment you can get in production - complete with the sendchange scripts to fire off hte exact test you're interested in.
Might be replaced by bisect in the cloud?
Add more ability to get runtime debugging output of the product for fialures/random oranges
- Run a failing test again wiht more debugging output? Do we have debugging settings we can actually toggle?
- Could potentially re-use the existing NSPR logging and enable it at run time through ENV var, but it's not clear that it would get us enough logging to be useful. TODO Investigate.

Long Term

Downloadable environment to run tests in that matche the bbot environment (depends on running tests on VMs)
halt on error for manual investigation

More flexible bbot scheduling

Short Term

Allow for having more of a stage area to try out various changes to theproduction environment
Easier to try new harness automation
Put some config files into the trees now. à la talos.json

Long Term

Completely move away from buildbot scheduler
There are off the shelf products for desktop but not mobile
- But there is an ability to make scheduling for mobile appear to be scheduling things on desktop (using mozharness/mozpool/lifeguard)

Bugmail/Personal, Product Bugzilla stati

Short Term

Release 4.2 with the dashboards
- See how they get used in the wild and respond to requests and input
annoying reminder when you have reviews that are >2 days old!!!!!!!!!!!!!!!!!
- existing functionality in bugzilla now (time frame is 7 days)
- might look into decreasing time out
Use X-Bugzilla-Who to filter out TBPL robot signer.

Long Term

Extend component watching to make the email easier to fine tune
Analyze the use of "Product-izing ed's workflow to escape the TBPL robot comments"

Bisect in the cloud

Short Term

Should solve performance and correctness issues
depends on buildbot scheduling
- Depends on schedule builds & tests for itself (or it builds itself)
- should be smart enough to use existing builds where it is possible (recent, non-coalesced)
- What drives this? A script we maintain that works with buildbot or buildbot itself
- The data display - what's the best way to present the output of the output of this tool?
- How is the tool going to be used?
  - Use case A): we don't run all the tests all the time - the thing could be fired off from TBPL to go back and figure out what broke on the treee
  - Use case B) we have something that has no automated test but we create a failing one and we give this monster that test and let it go find which changeset broke it.
- One idea is to replicate doing the sheriff action with pushing a patch, building, retriggering etc. until you find the smallest range possible - might be one way to do a first cut.
- Ideally go down into individual changesets i.e. with hg bisect and not pushes (but that requires extra builds that aren't mapped to a push) might be longterm
Depends on having enough capacity to dedicate machines with this
ted has part of it - mozregression is another part of it

Long Term

Something stand alone but integrated (visually) with TBPL etc.
Good UI to showcase the regression hunt, status, and the regression range
Fire and forget developer use case. It emails them a link to the UI when it is finished

Orange quarantine

Short Term

Long Term

Opening up of TBPL to allow any automation system to show data

Short Term

Long Term

Streamline Bugzilla integration with Try/Checkin/Reviews

Short Term

Long Term

Test Measurements of how worthwhile/useful tests are

Short Term

Long Term

Make all test runners use the same code/methodology - mozbase, mach front ends, to make tests both easy to run, use, and to write.

Short Term

Long Term

Go through these and decide what the short and long term deliverables are on each one. Then prioritize.

spokesperson: ???
Joel will lead the releng + team mtg
Mcote will set up the Toronto Developer meeting
On Thurs. Ed M will join us.

Auto-tools/Projects/Futurama/2012-10-16

Contents

The Top of the List

Capacity

Short Term

Long Term

Turn around time

Short Term

Long Term

Reproducibility

Short Term

Long Term

More flexible bbot scheduling

Short Term

Long Term

Bugmail/Personal, Product Bugzilla stati

Short Term

Long Term

Bisect in the cloud

Short Term

Long Term

Orange quarantine

Short Term

Long Term

Opening up of TBPL to allow any automation system to show data

Short Term

Long Term

Streamline Bugzilla integration with Try/Checkin/Reviews

Short Term

Long Term

Test Measurements of how worthwhile/useful tests are

Short Term

Long Term

Make all test runners use the same code/methodology - mozbase, mach front ends, to make tests both easy to run, use, and to write.

Short Term

Long Term

Navigation menu

Auto-tools/Projects/Futurama/2012-10-16

The Top of the List

Capacity

Short Term

Long Term

Turn around time

Short Term

Long Term

Reproducibility

Short Term

Long Term

More flexible bbot scheduling

Short Term

Long Term

Bugmail/Personal, Product Bugzilla stati

Short Term

Long Term

Bisect in the cloud

Short Term

Long Term

Orange quarantine

Short Term

Long Term

Opening up of TBPL to allow any automation system to show data

Short Term

Long Term

Streamline Bugzilla integration with Try/Checkin/Reviews

Short Term

Long Term

Test Measurements of how worthwhile/useful tests are

Short Term

Long Term

Make all test runners use the same code/methodology - mozbase, mach front ends, to make tests both easy to run, use, and to write.

Short Term

Long Term

Navigation menu

Search