Performance/Fenix/Performance reviews: Difference between revisions

← Older edit

Performance/Fenix/Performance reviews (view source)

Revision as of 12:42, 29 March 2023

6,613 bytes added , 29 March 2023

Add information for backfilling performance regressions.

Gmierz

97

edits

@@ Line 9: / Line 9: @@
 The trade-offs for each technique are mentioned in their respective section.
+== Benchmark remotely ==
+You can run benchmarks remotely in CI/automation now.
+First, you will need to clone, and setup mozilla-central, or mozilla-unified locally. See [https://firefox-source-docs.mozilla.org/setup/index.html these build instructions] for how to get setup. When prompted what kind of build you would like, select an Artifact build as you won't need to build Firefox.
+After/while building and setting up, you'll need to get setup for pushing to try. [https://firefox-source-docs.mozilla.org/tools/try/index.html Follow this link to do this].
+Once you've finished setting up, you should be able to run <code>./mach try perf --help</code>. Now you're ready to test custom APKs on the try branch with the following instructions:
+* '''Get the path to your custom APK.'''
+** Ensure that you have a nightly APK build so that the activity, intents, and package name line up with existing tasks.
+* '''Run <code>./mach try perf --mozperftest-upload-apk /path/to/apk </code> to copy the APK in-tree.'''
+** This will replace the APK used in mozperftest tests (e.g. startup tests).
+** Use <code>--browsertime-upload-apk</code> if you want to target Browsertime performance tests.
+* '''Commit the changes in-tree with: <code>hg bookmark my-upload; hg commit -m "Upload APK"</code>'''
+** Note that this will move you out of the default branch/bookmark (central, or unified). To return to the original bookmark, run <code>hg update central</code>.
+** You can find a great [http://guide%20for%20how%20tohttps://mikeconley.github.io/documents/How_mconley_uses_Mercurial_for_Mozilla_code.html%20use%20Hg%20written%20by%20Mike%20Conley%20%5B:mconley%5D%20her guide for how to use Hg written by Mike Conley [:mconley<nowiki>]</nowiki> here].
+* '''Now you can re-run the performance selector to pick the tests you want to run, and perform the push: <code>./mach try perf --android --show-all</code>'''
+** Search for <code>'perftest 'start</code> to find the startup tests.
+** You'll be provided with a PerfCompare View link once the try runs are pushed that will show you a before/after comparison of the performance differences.
+** You can also find all your try pushes at <code>https://treeherder.mozilla.org/jobs?repo=try&author=YOUR_EMAIL</code>.
 == Benchmark locally ==
-A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results. However, automated benchmarks are time consuming and difficult to write so sometimes it's better to perform manual tests.
+A benchmark is an automated test that measures performance, usually the duration from point A to point B. Automated benchmarks have similar trade-offs to automated functionality tests when compared to one-off manual testing: they can continuously catch regressions and minimize human error. For manual benchmarks in particular, it can be tricky to be consistent about how we aggregate each test run into the results.
-Unfortunately, we don't yet support benchmarks in CI so you'll have to run them manually. '''Please use a low-end device.'''
+See the [[#Benchmark remotely|Benchmark remotely]] section for information about how you can run these tests in CI/automation.
 '''To benchmark, do the following:'''
@@ Line 70: / Line 92: @@
 </syntaxhighlight>
-Like automated benchmarks, these tests can accurately measure what users experience. However, they are fairly quick to write and execute but are tedious and time-consuming to carry out and have many places to introduce errors. We '''recommend this approach for non-UI measurements only.''' Since the framework doesn't notify us when the UI is visually complete, it's challenging to instrument that point and thus accurately measure a duration that waits for the UI.
+We '''recommend this approach for non-UI measurements only.''' Since the framework doesn't notify us when the UI is visually complete, it's challenging to instrument that point and thus accurately measure a duration that waits for the UI.
+Like automated benchmarks, these tests can accurately measure what users experience. However, they are fairly quick to write and execute but are tedious and time-consuming to carry out and have many places to introduce errors.
 Here's an '''outline of a typical timestamp benchmark''':
@@ Line 96: / Line 120: @@
 == Profile ==
-You can [https://profiler.firefox.com/docs/#/./guide-remote-profiling take profiles with the Firefox Profiler], identify the start and end points for the duration you're measuring in your profile, and use the difference between them to measure the duration. It's quick to take these profiles but there are big downsides: profilers add overhead so the duration will not be precise, it's difficult to avoid noise in the results because devs will only take a few profiles, and it may non-trivial to correctly identify the start and end points of the duration especially when comparing big differences in implementation.
+You can [https://profiler.firefox.com/docs/#/./guide-remote-profiling take profiles with the Firefox Profiler], identify the start and end points for the duration you're measuring in your profile, and use the difference between them to measure the duration. It's quick to take these profiles but there are big downsides: profilers add overhead so the duration will not be precise, it's difficult to avoid noise in the results because devs can only take so many profiles, and it may be non-trivial to correctly identify the start and end points of the duration especially when the implementations you compare have big differences.
+Follow the example below to see how to measure the change in duration for a change with profiles.
+=== Example: time to display homescreen ===
+On a low-end device...
+# We pick the specific duration we want to measure: the time from hitting the home button when a tab is open until the homescreen is visually complete.
+# We build & install a '''release build''' (e.g. Nightly, Beta; debug builds have unrepresentative perf). You can also use a recent Nightly, like this example does.
+# We '''do a "warm up" run''' to populate the JIT's cache (the first run has unrepresentative perf). We start the app, set up the start state (open a tab), do our use case (click the home button and wait for the UI to fully load). Then we force-stop the app.
+# We '''profile:''' start the app (which should launch to the most recent tab), start the profiler (see [https://profiler.firefox.com/docs/#/./guide-remote-profiling here for instructions]), perform the use case (click the home button in the toolbar and wait for the homescreen to finish loading), and stop the profiler. Don't forget to enable the profiler permissions (3-dot menu -> Settings -> Remote debugging via USB).
+# We identify the duration of the change [https://share.firefox.dev/3odXiYR in the raw profile]. The most accurate and reproducible way to do this is using the Marker Chart.
+## In this case, we can identify '''the start point''' through the <code>dispatchTouchEvent ACTION_UP</code> marker, right-click it, and choose "Start selection here" to narrow the profile's timeline range. We can then click the magnifying glass with the + in the timeline to clamp the range.
+## '''The end point''' is more tricky: we don't have a marker to identify that the UI is visually complete. As such, we can use the information in the Marker Chart and Stack Chart to make a best guess as to when the UI is visually complete (notice that this creates a point of inaccuracy). If we temporarily clamp our range to after the last marker (<code>onGlobalLayout</code>) is run, [https://share.firefox.dev/3ccSWfb we see that there is a measure/layout pass for Compose after it]. We make a best guess that the content isn't visually complete until this last measure/layout/draw pass completes. To clamp the range to this, we can double-click on the <code>draw</code> method above <code>measureAndLayout</code> to shrink our range to that method – this lets us accurately capture the end point. Then we can drag the selection handle to re-expand the range all the way to the left, back to our start point. Then we can clamp the range given that the start and end points we want to measure are the start and end points of the range. The final profile – https://share.firefox.dev/3o7EvOI – gives us our final duration, which we can see in the value at the top left of the profiler: 1.4s in this case.
+With the measurement in hand, repeat these steps for your changes and compare the resulting times. Note: it's possible the device was under load when you took the profile so you may wish to take more than one profile if you suspect that is the case.
+== Backfilling to determine culprit commit ==
+We now have alerting enabled on the firefox-android branch/project. If any changes are detected, an alert will be produced after ~6 days (once enough data is produced). With the alert, you'll need to determine the culprit commit, or the commit that caused the regression.
+To do so, you can start with backfilling on the firefox-android branch. "Backfilling" is the act of running the regressing test on past pushes that didn't run the test. This lets you fill in all the holes you have in your data (or backfilling them). It's likely that you'll find a different culprit commit than the one you originally found.
+If the culprit commit suggests that there's something coming from mozilla-central/geckoview, then you'll need to move to mozilla-central and start searching for the culprit commit there with the same backfilling process. That said, you'll need to use try runs to do this with custom Fenix APKs that are built using the current mozilla-central/geckoview commit. You should be able to find geckoview artifacts in autoland/mozilla-central so you won't need to remake them. See the [[#Benchmark remotely|Benchmark remotely]] section for information about how you can do those try runs. Please reach out in [https://matrix.to/#/#perftest:mozilla.org| #perftest] if you need any help, or if anything is unclear.