Auto-tools/Projects/Futurama/2012-10-16
From MozillaWiki
< Auto-tools | Projects | Futurama
Contents
- 1 The Top of the List
- 2 Capacity
- 3 Turn around time
- 4 Reproducibility
- 5 More flexible bbot scheduling
- 6 Bugmail/Personal, Product Bugzilla stati
- 7 Bisect in the cloud
- 8 Orange quarantine
- 9 Opening up of TBPL to allow any automation system to show data
- 10 Streamline Bugzilla integration with Try/Checkin/Reviews
- 11 Test Measurements of how worthwhile/useful tests are
- 12 Make all test runners use the same code/methodology - mozbase, mach front ends, to make tests both easy to run, use, and to write.
The Top of the List
Capacity
Short Term
- Analyze VMs - *can* we run tests on AWS -speaking to some people no! AWS can be slow for Windows (20 min startup sometimes)
- Get more machines (releng) (iX machines)
- Windows and linux test slave replacements
- Are we going to use these for both talos and unittests? Follow up with Armenzg
- 1044 minis in production
- Identify problematic machines - reflash them to fix them. Can we auto-detect dead machines (reduces intermittency)
- For VMs, we would probably want a one-off script to compare between vms and desktop env's.
- Monitor sdcard burnout on mobile
Long Term
- For branch to branch comparisons of intermittency we want this in orange factor
- Tools to monitor failure trends on mobile devices w.r.t. burnout, dead sdcard etc.
Turn around time
Short Term
- Develop a short cycle test - would need to be scrupulously maintained
- Smoke test to start/stop the browser - adding more tests to that doesn't really seem to add any other stuff.
- If windows builds fail, but Linux, OSX, Android are fine - would we not schedule full suite of tests on all platforms because one platform is dead?
- On try you still want all the results. Whereas on inbound, then you don't want to waste the cycles on something that will be backed out. Sheriff could retrigger if for some reason we still want those tests
- Low hanging fruit in some harnesses - like parallelizing xpcshell on multi-core
- xpcshell could be improved by having fast disks as well (for tests that use sqllite data - using ramdisk for tmp/ file) could we do ramdisk on windows?
- Be smarter about when we run various tests
- Run tests off of where they were checked in - don't run robocop if mobile doesn't change etc.
- don't run js tests if you're not actually changing js stuff
- don't run desktop tests if you only change b2g
- Run JS tests as part of make check - could pull these out and optimize them
Long Term
- makefile optimizations - mozbuild files (build faster)
- make the tests faster?
- only run a portion of tests and at some interval run all (but need bisect in the cloud)
- run the longer tests periodically
- run the high frequency orange tests periodically
- parallelize tests more between machines
- run the tests that have never failed in the last year periodically (i.e. dom-levelX)
- choose statistically a sampling of tests for each changeset
- only run a portion of tests and at some interval run all (but need bisect in the cloud)
- What is the feasibility of audit of tests and measure what is in there and what is duplicated etc. (test measurement)
- even more awesome would be some sort of test ownership that emerges from this audit
Reproducibility
Short Term
- Envrionment you can get in production - complete with the sendchange scripts to fire off hte exact test you're interested in.
- Might be replaced by bisect in the cloud?
- Add more ability to get runtime debugging output of the product for fialures/random oranges
- Run a failing test again wiht more debugging output? Do we have debugging settings we can actually toggle?
- Could potentially re-use the existing NSPR logging and enable it at run time through ENV var, but it's not clear that it would get us enough logging to be useful. TODO Investigate.
Long Term
- Downloadable environment to run tests in that matche the bbot environment (depends on running tests on VMs)
- halt on error for manual investigation
More flexible bbot scheduling
Short Term
- Allow for having more of a stage area to try out various changes to theproduction environment
- Easier to try new harness automation
- Put some config files into the trees now. à la talos.json
Long Term
- Completely move away from buildbot scheduler
- There are off the shelf products for desktop but not mobile
- But there is an ability to make scheduling for mobile appear to be scheduling things on desktop (using mozharness/mozpool/lifeguard)
Bugmail/Personal, Product Bugzilla stati
Short Term
- Release 4.2 with the dashboards
- See how they get used in the wild and respond to requests and input
- annoying reminder when you have reviews that are >2 days old!!!!!!!!!!!!!!!!!
- existing functionality in bugzilla now (time frame is 7 days)
- might look into decreasing time out
- Use X-Bugzilla-Who to filter out TBPL robot signer.
Long Term
- Extend component watching to make the email easier to fine tune
- Analyze the use of "Product-izing ed's workflow to escape the TBPL robot comments"
Bisect in the cloud
Short Term
- Should solve performance and correctness issues
- depends on buildbot scheduling
- Depends on schedule builds & tests for itself (or it builds itself)
- should be smart enough to use existing builds where it is possible (recent, non-coalesced)
- What drives this? A script we maintain that works with buildbot or buildbot itself
- The data display - what's the best way to present the output of the output of this tool?
- How is the tool going to be used?
- Use case A): we don't run all the tests all the time - the thing could be fired off from TBPL to go back and figure out what broke on the treee
- Use case B) we have something that has no automated test but we create a failing one and we give this monster that test and let it go find which changeset broke it.
- One idea is to replicate doing the sheriff action with pushing a patch, building, retriggering etc. until you find the smallest range possible - might be one way to do a first cut.
- Ideally go down into individual changesets i.e. with hg bisect and not pushes (but that requires extra builds that aren't mapped to a push) might be longterm
- Depends on having enough capacity to dedicate machines with this
- ted has part of it - mozregression is another part of it
Long Term
- Something stand alone but integrated (visually) with TBPL etc.
- Good UI to showcase the regression hunt, status, and the regression range
- Fire and forget developer use case. It emails them a link to the UI when it is finished
Orange quarantine
Short Term
Long Term
Opening up of TBPL to allow any automation system to show data
Short Term
Long Term
Streamline Bugzilla integration with Try/Checkin/Reviews
Short Term
Long Term
Test Measurements of how worthwhile/useful tests are
Short Term
Long Term
Make all test runners use the same code/methodology - mozbase, mach front ends, to make tests both easy to run, use, and to write.
Short Term
Long Term
Go through these and decide what the short and long term deliverables are on each one. Then prioritize.
- spokesperson: ???
- Joel will lead the releng + team mtg
- Mcote will set up the Toronto Developer meeting
- On Thurs. Ed M will join us.