Mobile/Testing/04 24 13
Jump to navigation
Jump to search
Previous Action Items
- Gbrown to follow up with the Tree Sheriffs to get robocop tests unhidden once more now that the strategic test disabling seems to have been done
- Follow-on - once the pandas are re-wired, we'll send a try job re-enabling those tests so that we can see if those particular tests were causing reboots due to increased CPU activity thus causing a power spike.
- Jake and Kim will have all the pandas upgraded with new power infrastructure by Monday
- Dan will let us know at the next meeting where we stand w.r.t. the amount of work estimated to replace tegras with pandas running 2.3.x.
- Follow-on once we know that, Joduinn and I (ctalbert) will need to talk with Karen and Blassey about their projected timelines for EOL'ing 2.2 support.
Status reports
Dev team
Rel Eng
- (kmoir) brought down masters to facilitate chassis maintenance. Mozpool/mozharness work for android pandas.
IT
- Still working on a higher density chassis. Just waiting for the prototype chassis to be fabricated.
- bug 860028 Replacing 5v supply wire and adj power supply output in panda chassis in scl1 - COMPLETED
A Team
General
- I am seeing very little change in the frequency for bugs:
- bug 822321 - Intermittent Panda "Could not connect; sleeping for 5 seconds. reconnecting socket"...
- tegra M1, panda rc1, rc2 <- top failure listed above
- bug 663657 - Intermittent Android "command timed out: 2400 seconds without output, attempting to kill"
- panda m2, rc2 <- top failure listed above
- bug 807230 - Intermittent DMError: Automation Error: Timeout in command {ls,ps,isdir,mkdr}, ...
- doesn't happen in talos! but evenly distributed across reftest/mochitest/robocop
- bug 822321 - Intermittent Panda "Could not connect; sleeping for 5 seconds. reconnecting socket"...
- the above bugs should have been reduced with the wiring change.
- investigating "rouge" pandas
- during the smoketests to validate the wiring change, we say about 10% of the pandas being problematic. Average panda failure rates were 1-5%, but these "rouge" pandas were 7-15%.
- running just those pandas standalone yielded the same results as running with all the other pandas
- total smoketest failure rate 4.5%, without 10% of pandas 3.1%.
- How can we detect these?
- proposal:
- detected 20 jobs in the last 48 hours for a given panda
- detected >=2 failures for that given panda in the last 48 hours
- safeguard: if we detect >15% of the pool, just flag somebody in case there is a infra outage or a few bad builds
- remediate: pull panda reflash panda, reseat sdcard
- correction: if panda is "remediated" 3 times in 30 days, change SD Card
- dead: if we have hit the correction stage 3 times for a given panda, throw away the board
Android 2.3.5
- Current status is: https://bugzilla.mozilla.org/show_bug.cgi?id=859766.
- Largest issue seems to be losing focus, which I think accounts for a lot of the slowness and timeouts we are
seeing.
- Need to discuss prioritization / timelines with respect to other tasks.
x86 automation
- I am running throught the mochitests to get a rough idea of how stable the emulator is
- I do see some timeouts and occasional process crashes. I'm planning to rerun some of this on the actual phone to hopefully determine if this an emulator issue or a product stability issue
Autophone
Eideticker
Round Table
- should we disable tests that are hard to fix and known to cause a lot of failures?
- specifically webgl!
- tbpl starring sometimes posts process crash and timeout/connectivity bug, even though all the tests have completed
- should we fix this?
- should we detect if a harness has completed and then only report shutdown failures?
- other ideas?