Performance/Fenix

This page describes some basics of Fenix performance. For an in-depth look at some specific topics, see:

Best Practices for tips to write performant code
Profilers and Tools for comparison of profiling and benchmarking tools
Performance reviews for how to know if your change impacts performance

Performance testing

Performance tests can have a goal of preventing regressions, measuring absolute performance as experienced by users, or measuring performance vs. a baseline (e.g. comparing fenix to fennec). It can be difficult to write tests that manage all of these. We tend to focus on preventing regressions.

Dashboards

We have a series of dashboards that we review during our biweekly performance sync meeting. The dashboards may be unstable and may be difficult to interpret so be cautious when drawing conclusions from the results. A shortcoming is that we only run these tests on Nightly builds. Here are the current dashboards:

Start up duration: this is represents COLD MAIN (app icon launch) to reportFullyDrawn/visual completeness and COLD VIEW (app link) to page load complete (i.e. this includes a variable-duration network call) across a range of devices. We're trying to replace this with more stable tests
Page load duration: we're iterating on the presentation to make this more useful. More complex visualizations are available in the grafana folder, such as the tests for Pixel 2
App size: via Google Play

Unmonitored tests running in fenix

In addition to the tests we actively look at above, there are other tests that run in mozilla-central on fenix or GeckoView example. We're not sure who looks at these. The perftest team is working to dynamically generate the list of tests that run. Some progress can be seen in this query and this treeherder page. Until then, we manually list the tests below.

As of Feb. 23, 2021, we run at least the following performance tests on fenix:

Additional page load duration tests: see the query above for a list of sites (sometimes run in automation, sometimes run manually; todo: details)
media playback tests (TODO: details; in the query above, they are prefixed with ytp)
Start up duration via mach perftest
Speedometer: JS responsiveness tests (todo: details)
tier 3 unity webGL tests (todo: details)

There are other tests that run on desktop that will cover other parts of the platform.

Preventing regressions automatically

We use the following measures:

Crash on main thread IO in debug builds using StrictMode (code)
Use our StartupExcessiveResourceUseTest, for which we are Code Owners, to:
- Avoid StrictMode suppressions
- Avoid runBlocking calls
- Avoid additional component initialization
- Avoid increasing the view hierarchy depth
- Avoid having ConstraintLayout as a RecyclerView child
- Avoid increasing the number of inflations
Use lint to avoid multiple ConstraintLayouts in the same file (code)

How to measure what users experience

When analyzing performance, it's critical to measure the app as users experience it. This section describes how to do that and avoid pitfalls. Note: our automated measurement tools, such as the measure_start_up.py script, will always use our most up-to-date techniques while this page may get outdated. Prefer to use automated systems if practical and read the source if you have questions!

When measuring performance manually, you might follow a pattern like the following (see the footnotes for explanations):

Configure your device and build. Use:
- a low-end device¹ (a Samsung Galaxy A51 is preferred)
- a debuggable=false build such as Nightly²
- enable any compile-time options that are enabled in the production app (e.g. Sentry, Nimbus, Adjust, etc.)³
Warm-up run:
- Start the app, especially after an installation⁴
- Wait at least 60 seconds⁵. To be extra safe, wait 2 minutes.
- Set the state of the app as you want to test it (e.g. clear onboarding)
- Force-stop the app (to make sure you're measuring at least the 2nd run after installation)
Measure or test:
- Start the app and measure what you want to measure
- If you force-stop the app, wait a few seconds before starting the app to let the device settle
- If you're testing code that waits for gecko initialization (e.g. page loads) and need to force-stop the app before measuring, make sure to 1) load a page and 2) wait 15 seconds before force-stopping the app⁶

Footnotes:

1: high-end devices may be fast enough to hide performance problems. For context, a Pixel 2 is still relatively high-end in our user base
2: `debuggable=true` builds (e.g. Debug builds) have performance characteristics that don't represent what users experience. See https://www.youtube.com/watch?v=ZffMCJdA5Qc&feature=youtu.be&t=625 for details
3: if these SDKs are disabled, you may miss performance issues introduced by them or their absence will change the timing of our operations, possibly hiding performance issues. Note: the performance team would prefer for all SDKs to be enabled by default so developers can error-free build an APK similar to production APKs
4: we've observed the first run after installation is always slower than subsequent runs for an unknown reason
5: on first run, we populate certain caches, e.g. we'll fetch Pocket data and start a Gecko cache. 60 seconds will address most of these
6: the ScriptPreloader will generate a new cache on each app start up. If you don't let the cache fill (i.e. by loading a page and waiting until it caches (source), the cache will be empty and you won't page load it as most users experience it

Glossary

Start up "type"

This is an aggregation of all of the variables that make up a start up, described more fully below. Currently, these variables are:

state
path

For example, a type of start up could be described as cold_main.

Start up "state": COLD/WARM/HOT

"State" refers to how cached the application is, which will impact how quickly it starts up.

Google Play provides a set of definitions and our definitions are similar to, but not identical, to them:

COLD = starting up "from scratch": the process and HomeActivity need to be created
WARM = the process is already created but HomeActivity needs to be created (or recreated)
HOT = basically just foregrounding the app: the process and HomeActivity are already created

Start up "path": MAIN/VIEW

"Path" refers to the code path taken for this start up. We name these after the action inside the Intents received by the app such as ACTION_MAIN that tell the app what to do:

MAIN = a start up where the app icon was clicked. If there are no existing tabs, the homescreen will be shown. If there are existing tabs, the last selected one will be restored
VIEW = a start up where a link was clicked. In the default case, a new tab will be opened and the URL will be loaded

Caveat: if an Intent is invalid, we may end up on a different screen (and thus taking a different code path) than the one specified by the Intent. For example, an invalid VIEW Intent may instead be treated as a MAIN Intent.

Performance/Fenix

Contents

Performance testing

Dashboards

Unmonitored tests running in fenix

Preventing regressions automatically

How to measure what users experience

Glossary

Start up "type"

Start up "state": COLD/WARM/HOT

Start up "path": MAIN/VIEW

Navigation menu

Performance/Fenix

Performance testing

Dashboards

Unmonitored tests running in fenix

Preventing regressions automatically

How to measure what users experience

Glossary

Start up "type"

Start up "state": COLD/WARM/HOT

Start up "path": MAIN/VIEW

Navigation menu

Search