Previous Action Items

Gbrown to follow up with the Tree Sheriffs to get robocop tests unhidden once more now that the strategic test disabling seems to have been done
- Follow-on - once the pandas are re-wired, we'll send a try job re-enabling those tests so that we can see if those particular tests were causing reboots due to increased CPU activity thus causing a power spike.
- -> Panda rc is no longer hidden
- -> Disabled rc tests still cause failures if enabled
Jake and Kim will have all the pandas upgraded with new power infrastructure by Monday
Dan will let us know at the next meeting where we stand w.r.t. the amount of work estimated to replace tegras with pandas running 2.3.x.
- Follow-on once we know that, Joduinn and I (ctalbert) will need to talk with Karen and Blassey about their projected timelines for EOL'ing 2.2 support.

Status reports

Dev team

Found a cause of "2400 seconds without output" failures bug 663657

Rel Eng

(kmoir) brought down masters to facilitate chassis maintenance. Mozpool/mozharness work for android pandas.

IT

Still working on a higher density chassis. Just waiting for the prototype chassis to be fabricated.
bug 860028 Replacing 5v supply wire and adj power supply output in panda chassis in scl1 - COMPLETED

A Team

General

tegra failure rate: [7.00%]
- tpn, r4, m1
panda failure rate: [14.02%]
- m2, rc1, rc2, j3, talos-s

I am seeing very little change in the frequency for bugs:
- bug 822321 - Intermittent Panda "Could not connect; sleeping for 5 seconds. reconnecting socket"...
  - tegra M1, panda rc1, rc2 <- top failure listed above
- bug 663657 - Intermittent Android "command timed out: 2400 seconds without output, attempting to kill"
  - panda m2, rc2 <- top failure listed above
- bug 807230 - Intermittent DMError: Automation Error: Timeout in command {ls,ps,isdir,mkdr}, ...
  - doesn't happen in talos! but evenly distributed across reftest/mochitest/robocop
the above bugs should have been reduced with the wiring change.

investigating "rouge" pandas
- during the smoketests to validate the wiring change, we say about 10% of the pandas being problematic. Average panda failure rates were 1-5%, but these "rouge" pandas were 7-15%.
- running just those pandas standalone yielded the same results as running with all the other pandas
- total smoketest failure rate 4.5%, without 10% of pandas 3.1%.
- How can we detect these?
- proposal:
  - detected 20 jobs in the last 48 hours for a given panda
  - detected >=2 failures for that given panda in the last 48 hours
  - safeguard: if we detect >15% of the pool, just flag somebody in case there is a infra outage or a few bad builds
  - remediate: pull panda reflash panda, reseat sdcard
  - correction: if panda is "remediated" 3 times in 30 days, change SD Card
  - dead: if we have hit the correction stage 3 times for a given panda, throw away the board

collecting network traffic using wireshark might help us to distinguish between connectivity issues due to reboots and other possible connectivity problems

Android 2.3.5

Current status is at: bug 859766.
Largest issue seems to be timeouts, possibly due to losing focus
- a patch for this aimed at b2g landed recently, I will retest and see if things have improved
Need to discuss prioritization / timelines with respect to other tasks.
- estimate 3 months of work to stand up 2.3.5 on pandas, with another quarter or so of bug fixes / maintenance

x86 automation

I am running throught the mochitests to get a rough idea of how stable the emulator is
- I do see some timeouts and occasional process crashes. I'm planning to rerun some of this on the actual phone to hopefully determine if this an emulator issue or a product stability issue

Autophone

[bc] Adding additional 1 Samsung GS II and 2 GS III phones.
[bc] bug 862456 Security Review for Phonedash
[bc] Testing throbber start performance with original fennec launch code vs. mozbase's launchFennec with and without -W parameter to am.
[bc] Planning to investigate using standard deviation to gate retests in attempt to reduce jitter.

Eideticker

Added a test for loading webpage post-startup (bug 860790)
- Results here: http://eideticker.wrla.ch/#/samsung-gn/nytimes-load-poststartup/timetostableframe
Got some b2g eideticker stuff working which may also be interesting for android: http://wrla.ch/blog/2013/04/actual-useful-firefoxos-eideticker-results-at-last/

Round Table

should we disable tests that are hard to fix and known to cause a lot of failures?
- specifically webgl!
tbpl starring sometimes posts process crash and timeout/connectivity bug, even though all the tests have completed
- should we fix this?
- should we detect if a harness has completed and then only report shutdown failures?
- other ideas?

Action Items

(jmaher) explain your round table items
(ctalbert) get kim a known good build
(kim) run tests over the weekend
(ctalbert) to email bad news
(ateam) split out webgl from mochitest-1
(wlach) to add stock and chome to new test
(blassey) follow up with karen to get 2.2 end of life plan

Mobile/Testing/04 24 13

Contents

Previous Action Items

Status reports

Dev team

Rel Eng

IT

A Team

General

Android 2.3.5

x86 automation

Autophone

Eideticker

Round Table

Action Items

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools