EngineeringProductivity/Autophone

From MozillaWiki
Jump to: navigation, search
Autophone mascot

Autophone

Autophone is a test framework for Firefox for Android (Fennec) which runs tests on actual Android devices. Autophone is currently used to measure page load performance and to run various Unit tests in Fennec.

Autophone is unlike most test frameworks at mozilla.

  • Its source is maintained outside of the mozilla source repositories.
  • It runs on a small number of devices.
  • It is hosted and managed separately from the other test frameworks in use at mozilla.
  • It only tests builds from mozilla-central, mozilla-inbound, fx-team, mozilla-aurora, mozilla-beta and mozilla-release.

Due to these limitations, Autophone test results on Treeherder are currently Tier-3 and are hidden by default. You will need to click the "Show excluded jobs" icon and select Tier 3 until Autophone achieves Tier 2 status. It is planned to transition Autophone to Tier 2 status by the end of 2016Q2.

In addition to reporting pass/fail results to Treeherder, Autophone reports performance measurements to Perfherder and phonedash.

Autophone Status

  • Autophone caught up on backlog.
  • Landed Bug 1275344 Autophone - download builds from the TaskCluster index this 2016-08-10
    • date ranges and build ids on http://phonedash.mozilla.org now in UTC to match Taskcluster. You need to clear your cache or force reload phonedash to get the changes.

Autophone for Sheriffs

Maintainers

Autophone is maintained by Bob Clary (Bugzilla :bc:) with help from Joel Maher (Bugzilla :jmaher) and Geoff Brown (Bugzilla :gbrown). Dan Minor (Bugzilla :dminor) is working to get WebRTC tests running in production. You can usually find at least one of us in #ateam on irc.mozilla.org.

Bugs

File any bugs or infrastructure issues with Autophone in bugzilla under product Testing component Autophone.

Disabling an individual failing test

The procedure for disabling tests varies depending whether you wish to prevent Autophone from running a test suite entirely or if you wish to disable an individual Unit test contained in the mozilla source tree.

To disable an entire Autophone test such as a S1S2 Test or an entire Unit test suite such as Autophone Mochitest WebRTC, please file a bug giving the test to be disabled, the reason it should be disabled along with a link to a Treeherder job illustrating the problem if possible. The Autophone maintainers will update the appropropriate test manifest files and restart the Autophone instances.

To disable individual tests within Unit test suites, follow the same procedures for disabling the test as you would for disabling the test normally. This typically involves editing the test manifests to skip the test for Android. You can find the Unit test manifests used by Autophone in the Tests table later in this document.

Autophone for Developers

Introduction

Submitting try builds to Autophone

The TryServer offer the means for developers who do not possess a rooted Android device to run Autophone tests as well as the ability to run tests on the exact same devices used in production.

Autophone will only execute tests against try builds if the try build's commit message explicitly specifies Autophone tests. Thus Autophone will not execute any tests when All is selected under Unit Test Suites since the try commit message will contain only '-u all'. This behavior is intentional and is intended to prevent developers from inadvertently scheduling Autophone try jobs.

Autophone has 3 devices dedicated to running try builds:

  • nexus-4-7
  • nexus-5-6
  • nexus-6p-7

and two devices which run both tinderbox and try builds:

  • nexus-6-2
  • nexus-9-2

In order to help reduce the turn around time for developers, Autophone prioritizes try builds and will test them before any normal tinderbox build. Note: Be careful not to DoS Autophone by submitting unnecessary requests since this can prevent the two shared devices from testing new tinderbox builds in a timely fashion.

Trychooser try commit message

Trychooser provides an easy method for selecting Autophone tests. Trychooser lists the tests which run in production on mozilla-inbound, fx-team, mozilla-central, mozilla-aurora, mozilla-beta or mozilla-release. It does not list all of the available tests however since some of them (reftest, jsreftest and mochitest) can tie up devices for many hours. If the test you wish to run on try is not available from the Trychooser, you can still compose the manual try commit using any of the supported tests.

  • Under Build Types, select either Opt or Debug. Autophone only supports opt builds when running performance tests such as S1S2 Test or Talos. Autophone does support both Opt and Debug when running Unit tests.
  • Under Platforms, select either one or both of Android api 9-10 constrained or Android api 15+. Note that currently (June 2016) Android api 9-10 is only supported on mozilla-release. After the release of Firefox 48 in August, Android 2.3 will no longer be supported.
  • Under Android-Only Unittest Suites, select only the tests that you need to run.

Trychooser will produce a try commit message of the form:

try: -b o -p android-api-15 -u autophone-mochitest-dom-media -t none

Manual try commit message

Autophone specifies its tests using the unittests argument in the try message. Autophone support both -u and --unittests.

try: -b o -p android-api-15 -u autophone-mochitest-dom-media -t none

or

try: -b o -p android-api-15 --unittests autophone-mochitest-dom-media -t none

The list of test names which can be used in the try commit message can be found in the Tests table later in this document.

Note that if a test specifies more than one chunk, you can either specify the test name to get all chunks or append a dash followed by a chunk number to only specify that chunk.

For example, to run all 16 reftest chunks:

try: -b o -p android-api-15 --unittests autophone-reftest -t none

To run only reftest chunk 3 use:

try: -b o -p android-api-15 --unittests autophone-reftest-3 -t none

Following your try build

  • Once you have submitted your build to try, you will be able to follow the execution of the tests on Treeherder using
    https://treeherder.mozilla.org/#/jobs?repo=try&filter-searchStr=autophone&exclusion_profile=all&filter-tier=1&filter-tier=2&filter-tier=3&filter-job_group_name=autophone&author=<youremail>
    where <youremail> is the email account registered with hg.mozilla.org. Note that until Autophone reaches Tier 2 status, results are hidden by default on Treeherder. You must click the "Show/hide hidden jobs" icon to set 'exclusion_profile=all' and include Tier 3 in order to show all jobs.
  • If your tests included the Autophone performance related tests, you can view the results on Perfherder or Phonedash. Phonedash may be a better choice when reviewing your Try server results since it allows you to choose to display only your Try server results.

Running Autophone tests locally

Once you have built Fennec, you can run Autophone locally iIf you have a rooted Android device available using the command

mach autophone

This will download the necessary python packages and install them locally into a virtual environment on your computer. You will be able to select the tests you wish to run from a series of text prompts.

Tests

Autophone currently supports the following tests:

Type Symbol Name Chunks Description Config In-tree manifest
Autophone s autophone-smoketest 1 smoketest - A simple test to determine if we can install, start Fennec and detect Throbber messages smoketest-settings.ini
Autophone t autophone-s1s2 1 S1S2 blank-local - A test to measure the load times for a local blank page. s1s2-blank-local.ini
Autophone t autophone-s1s2 1 S1S2 blank-remote - A test to measure the load times for a remote blank page. s1s2-blank-remote.ini
Autophone t autophone-s1s2 1 S1S2 nytimes-local - A test to measure the load times for a local nytimes page. s1s2-nytimes-local.ini
Autophone t autophone-s1s2 1 S1S2 nytimes-remote - A test to measure the load times for a remote nytimes page. s1s2-nytimes-remote.ini
Autophone t autophone-s1s2 1 S1S2 twitter-local - A test to measure the load times for a local twitter page. s1s2-twitter-local.ini
Autophone t autophone-s1s2 1 S1S2 twitter-remote - A test to measure the load times for a remote twitter page. s1s2-twitter-remote.ini
Autophone tpn autophone-talos 1 Talos - Talos tpn performance tests. tp4m-remote.ini
Autophone svg autophone-talos 1 Talos - Talos svg performance tests. tsvg-remote.ini
Unittest C autophone-crashtest 2 Reftest Crash tests https://github.com/mozilla/autophone/blob/master/configs/crashtests-settings.ini
Unittest J autophone-jsreftest 6 Reftest JavaScript tests https://github.com/mozilla/autophone/blob/master/configs/jsreftests-settings.ini https://dxr.mozilla.org/mozilla-central/source/js/src/tests/jstests.list
Unittest Mdb autophone-mochitest-dom-browser-element 1 Mochitest DOM Browser Element https://github.com/mozilla/autophone/blob/master/configs/mochitests-dom-browser-element-settings.ini dom/browser-element/mochitest/mochitest.ini
Unittest Mdm autophone-mochitest-dom-media 1 Mochitest DOM Media https://github.com/mozilla/autophone/blob/master/configs/mochitests-dom-media-settings.ini dom/media/test/mochitest.ini
Unittest Mm autophone-mochitest-media 1 Mochitest Media https://github.com/mozilla/autophone/blob/master/configs/mochitests-media.ini testing/mochitest/manifests/autophone-media.ini
Unittest M autophone-mochitest 16 Mochitest https://github.com/mozilla/autophone/blob/master/configs/mochitests-settings.ini generated file containing all mochitests
Unittest Msk autophone-mochitest-skia 1 Mochitest Skia https://github.com/mozilla/autophone/blob/master/configs/mochitests-skia-settings.ini dom/canvas/test/mochitest.ini
Unittest Mtw autophone-mochitest-toolkit-widgets 1 Mochitest Toolkit Widgets https://github.com/mozilla/autophone/blob/master/configs/mochitests-toolkit-widgets-settings.ini toolkit/content/tests/widgets/mochitest.ini
Unittest Mw autophone-mochitest-webrtc 1 Mochitest WebRTC https://github.com/mozilla/autophone/blob/master/configs/mochitests-webrtc.ini testing/mochitest/manifests/autophone-webrtc.ini
Unittest Rov autophone-reftest-ogg-video 1 Reftest Ogg Video https://github.com/mozilla/autophone/blob/master/configs/reftests-ogg-video.ini layout/reftests/ogg-video/reftest.list
Unittest R autophone-reftest 16 Reftest https://github.com/mozilla/autophone/blob/master/configs/reftests-settings.ini layout/reftests/reftest.list
Unittest Rwv autophone-webm-video 1 Reftest Webm Video https://github.com/mozilla/autophone/blob/master/configs/reftests-webm-video.ini laytout/reftests/webm-video/reftest.list
Unittest rca autophone-robocoptest-autophone 1 Mochitest Robocop test for Adobe Flash https://github.com/mozilla/autophone/blob/master/configs/robocoptests-autophone-settings.ini mobile/android/tests/browser/robocop/robocop_autophone.ini
Unittest rc autophone-robocoptest 4 Mochitest Robocop https://github.com/mozilla/autophone/blob/master/configs/robocoptests-settings.ini mobile/android/tests/browser/robocop/robocop.ini

Smoke test

smoketest.py tests if the build can be installed, a profile created and initialized and whether the Throbber messages can be detected in the logcat output. The Smoke test results appear on Treeherder as A(s).

Note: The Smoke test does not automatically run in production though it is available via the Try Server.

S1S2 Test

s1s2test.py measures the Throbber times for loading web pages. Three web pages are using in Autophone:

  1. a blank web page
  2. a saved version of a Twitter web page
  3. a saved version of a NY Times web page.

Note: git.mozilla.org is going away soon. We had permission to host the copyrighted Twitter and NY Times pages on git.mozilla.org, but will be manually distributing the saved pages in the future. If you need access to the actual files, contact one of the Autophone maintainers.

Autophone runs two versions of each test:

  • "Local" tests which load the pages from the device's internal storage.
  • "Remote" tests which load the pages from a web server running on the server hosting the device.

The S1S2 Tests appear on Treeherder as A(t).

As Mark Finkle described in bug 1120511#c6:

"Throbber Start and Throbber Stop should map to the points where Gecko nsIWebProgressListener fires START|NETWORK and STOP|NETWORK notifications, respectively. In the UI we use those to control the visibility of the "page progress" indicator. It used to be a throbber(spinner) but is now a simple progress line.

"The time between Throbber Start and Throbber Stop is a combination of Gecko networking and page parsing & loading, and rendering. We also have some Java UI affects too.

We need to convert the Throbber start and stop times from values relative to the system time to values relative to the time Firefox was started. We use the "Fennec application start" message added in bug 1214810. If it is not available, we fall back on the system time of the first logcat message after starting Fennec which contains the string "Gecko".

The logcat fennec start message used by Autophone is produced in GeckoApplication.java.

The logcat throbber messages used by Autophone are produced in ToolbarDisplayLayout.java and look like:

06-07 11:20:16.035 I/GeckoApplication( 7247): zerdatime 117697 - Fennec application start
06-07 11:20:18.323 I/GeckoToolbarDisplayLayout( 7247): zerdatime 119985 - Throbber start
06-07 11:20:18.548 I/GeckoToolbarDisplayLayout( 7247): zerdatime 120210 - Throbber stop

where "zerdatime" is the value of SystemClock.uptimeMillis(). For those of you who are curious about the meaning of zerda, it refers to the scientific name for Fennec foxes, Vulpes Zerda.

For historical reasons due to the initial lack of the "Fennec application start" message, Autophone uses logcat with the "time" format which provides the device's system time with millisecond resolution at the beginning of each logcat message. Autophone uses that value instead of the reported zerdatime. Now that "Fennec application start" is available in the current train of builds, we may be able to revisit the use of the logcat time stamps and begin using the zerda time directly.

The system time of the Throbber start and stop messages is determined similarly. The reported values of the Throbber start and stop times are the differences between the Throbber start and stop system times and the fennec start time.

The blank, twitter and nytimes pages contain JavaScript which invokes Jesse Ruderman's quitter.xpi extension to cleanly shut down the browser after the page completes loading. Shutting down the browser cleanly is important due to the side-effects of killing the browser which can negatively impact performace measurements.

For each build to be tested, it is installed, then a test run consists of performing the following operations 8 times:

  1. Create a new profile containing the quitter extension.
  2. Initialize the profile by starting the browser loading initialize_profile.html. initialize_profile.html is an empty page which calls quitter to shutdown the browser.
  3. Measure the "First Run" (uncached) Throbber start and stop values by starting the browser loading the test page.
  4. Measure the "Second Run" (cached) Throbber start and stop values by starting the browser with the same profile loading the test page again.

The values for each iteration are posted to Perfherder and phonedash.mozilla.org where the measurements can be displayed.

Talos Tests

TODO: jmaher?

Tp4m

Tsvg

Unit Tests

runtestsremote.py can run Reftest and Mochitest based tests though due to the time required to run each test and the limited number of devices, only the following tests are currently run:

  • Mdb - autophone-mochitest-dom-browser-element
  • Mdm - autophone-mochitest-dom-media
  • Mm - autophone-mochitest-media
  • Msk - autophone-mochitest-skia
  • Mtw - autophone-mochitest-toolkit-widgets
  • Mw - autophone-mochitest-webrtc
  • Rov - autophone-reftest-ogg-video
  • Rwv - autophone-reftest-webm-video
  • rca - autophone-robocoptest-autophone
  • t - autophone-s1s2
  • tpn/svg - autophone-talos

Devices

Autophone currently tests Nexus S (Android 2.3), Nexus 4 (Android 4.2.2), Nexus 5 (Android 4.4.2), Nexus 6 (Android 5.1.1), Nexus 9 (Android 5.0.2), Nexus 6P (Android 6.0.1).

The Nexus S devices are especially good at showing performance changes due to their slow speed and their single core processor. They are being phased out of testing as support for Android 2.3 is dropped and will be completely removed when Firefox 48 is released in August 2016. The other devices are faster, have multiple core processors and behave differently for multi-threaded code paths.

Reviewing Autophone test results

Monitoring Autophone tests on Treeherder

Load https://treeherder.mozilla.org/, then select one of the mozilla-inbound, fx-team, mozilla-central, mozilla-aurora, mozilla-beta or mozilla-release repositories.

Click "Show/hide hidden jobs" and select Tier 3 in order to view Autophone results since they are currently hidden by default on Treeherder until Autophone achieves Tier 2 status.

Autophone jobs appear with the group name A. To see only Autophone jobs, you can set the quick filter for "Platforms & jobs" to Autophone or use the "Filters" drop down to add a new filter based on "group name" Autophone.

For example, the following will show Autophone jobs on mozilla-central:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-job_group_name=autophone

Note that if you change the repository, you will need to click "Show/hide hidden jobs" and filter for Autophone again.

Links to the logcat output, Autophone log and any available tombstone or ANR files are available in the Job Details panel which is opened by clicking on an Autophone test symbol. If the test is an S1S2 test, the Job Details panel will contain a link to phonedash.mozilla.org which will display a graph of the performance measurements.

Note that you can retrigger and cancel Autophone jobs using the Treeherder UI.

Monitoring Autophone Performance tests on Perfherder

TODO

Monitoring Autophone Performance tests on Phonedash

When you first load phonedash, it defaults to loading a summary graph of all repositories, tests and devices for the last day. The graph is scaled to the window size at the time the graph is displayed. If you wish to change the size of the graph, resize the window and reload the page.

Controls

Date selection

Changing the date range controls at the top left of the page will automatically download the data for the date range and redraw the graph with the new date selection.

TIP: If you wish to pick a date range that ends in the past, you can prevent Autophone from unnecessarily loading data by first changing the end date, then changing the start date.

Many of the non-date controls are created from the test runs contained in the selected date range. If you make changes to the non-date controls, then change the date selection new controls will be created for any new tests, repositories or devices which were detected. In this case, you may need to manually select the new devices then click the Apply button to update the graph with the new data. You can quickly select all of the non-date controls and redraw the graph by clicking the Reset button.

Apply and Reset buttons

Below the Apply and Reset buttons, are a set of inputs which control which data is displayed in the graph. Changing these controls does not automatically redraw the graph.

In order to make your non-date control changes effective and redraw the graph, click on the Apply button. This allows you to change what and how data is graphed without having to download the data again.

To reset all of the non-date controls to their default values and redraw the graph, click the Reset button.

Non-date Controls

  • Binning

Binning controls how the various measurements are combined together to create the data series in the graph. Binning involves combining measurements by taking the geometric mean of all measurements which have the same binning value. This can sometimes help reviewing all of the repositories and/or tests looking for regressions or improvements. Once an interesting repository, test, metric or phone type is identified, the irrelevant items and be eliminated and the binning increased to highlight in changes in detail.

Note that changes in small values have a small effect on binned results. Relying solely on too gross of a binning may result in missing changes to small values such as the Throbber start times.

The possible binnings are:

  • repo
  • repo phonetype
  • repo phonetype phoneid
  • repo phonetype phoneid test_name
  • repo phonetype phoneid test_name cached_label
  • repo phonetype phoneid test_name cached_label metric

The finest level of binning is repo phonetype phoneid test_name cached_label metric' where the measurements are not binned at all.

  • Trim min/max values

When the Trim min/max values checkbox is checked, the minimum and maximum values of the 8 iterations for a data point are ignored when displaying the graph. This can be helpful if a measurement contains outliers which obscure the true behavior.

  • Exclude rejected results

In order to deal with the sometimes flaky behavior of the devices, Autophone will "reject" a set of measurements if the estimated standard error percentage exceeds a given threshold which is currently 50% of the value. If a set of measurements is rejected, Autophone will re-run the test in the hope that the variability was temporary. Even though the measurements are "rejected", they are still stored and are available when "Include rejected results" is selected.

By default, phonedash ignores measurements which were originally rejected by Autophone due to a high standard error. You can include thes values by changing the value from Exclude rejected results to Include rejected results.

  • Error bars

Errors bars are only displayed when the maximum binning * repo phonetype phoneid test_name cached_label metric is selected.

By default, phonedash does not display error bars since they can obscure details in the graph. If you wish to display Error bars, change No Error bars to Error bars.

  • Error type

By default, phonedash displays the standard error which is calculated from the standard deviation by diving the standard deviation by the square root of the number of observations.

To see the standard deviation instead of the 'standard error', change the select.

  • Measurement type

The data reported to the phonedash database consists of the raw data for all iterations in a test. Measurement type refers to how the phonedash web application treats these individual iterations when reporting the value for a test.

  • All
  • Mean
  • Geometric Mean
  • Median
  • Minimum

All displays each iteration separately. The other choices display the respective calculation on the iteration values. Measurement type controls what kinds of values are used in binning.

  • Tests

Each test detected in the requested date range is given a checkbox control. By checking or unchecking the tests, you can control which tests are displayed.

  • Metrics

Metrics refers to the measurement of the Throbber start, Throbber stop values or their difference, Throbber time. By checking or unchecking the metrics, you can control which are displayed.

  • Cached

Cached refers to whether the measurement is for the first or second visit to the test page. By checking or unchecking the cached values, you can control which are displayed.

  • Repositories

Each repository detected in the requested date range is given a checkbox control. By default, all repositories except try are checked. By checking or unchecking the repositories, you can control which repositories are displayed.

  • Phones

Phones are named according to their model and an sequential numeric identifier. For example, nexus-6p-1 is the first Nexus 6P device.

Each phone detected in the requested date range is given a checkbox control. By default, all devices are checked. By checking or unchecking the devices, you can control which devices are displayed.

Individual devices are grouped under their type which is also given a checkbox. Changing this device type checkbox will force the devices of that type to be either checked or unchecked to match the state of the device type checkbox.