Confirmed users
975
edits
(Clean up header; Mention no CI for benchamrk) |
(→Timestamp benchmark: Clean up, remove homescreen measurement section) |
||
| Line 61: | Line 61: | ||
== Timestamp benchmark == | == Timestamp benchmark == | ||
A timestamp benchmark is a manual test where a developer adds temporary code to log the duration they want to measure and then performs the use case on the device themselves to get the values printed. Here's | A timestamp benchmark is a manual test where a developer adds temporary code to log the duration they want to measure and then performs the use case on the device themselves to get the values printed. Here's a simple example: | ||
<syntaxhighlight lang="kotlin"> | <syntaxhighlight lang="kotlin"> | ||
| Line 70: | Line 70: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Like automated benchmarks, these tests can accurately measure what users experience. However, they are fairly quick to write and execute but are tedious and time-consuming to carry out and have many places to introduce errors. | Like automated benchmarks, these tests can accurately measure what users experience. However, they are fairly quick to write and execute but are tedious and time-consuming to carry out and have many places to introduce errors. We '''recommend this approach for non-UI measurements only.''' Since the framework doesn't notify us when the UI is visually complete, it's challenging to instrument that point and thus accurately measure a duration that waits for the UI. | ||
Here's an '''outline of a typical timestamp benchmark''': | Here's an '''outline of a typical timestamp benchmark''': | ||
# Decide the duration you want to measure | # Decide the duration you want to measure | ||
# Do the following once for the commit before your changes and once for the commit after your changes... | # Do the following once for the commit before your changes and once for the commit after your changes... | ||
## Add code to measure the duration | ## Add code to measure the duration. | ||
## Build & install '''a release build''' like Nightly or Beta (debug builds have unrepresentative perf) | ## Build & install '''a release build''' like Nightly or Beta (debug builds have unrepresentative perf) | ||
## Do a "warm up" run first: the first run will always be slower because the JIT cache isn't primed so you should run and ignore it, i.e. run your test case, wait a few seconds, force-stop the app, clear logcat, and then begin testing & measuring | ## Do a "warm up" run first: the first run after an install will always be slower because the JIT cache isn't primed so you should run and ignore it, i.e. run your test case, wait a few seconds, force-stop the app, clear logcat, and then begin testing & measuring | ||
## Run the use case several times (maybe 10 times if it's quick, 5 if it's slow). You probably want to measure "cold" performance: we assume users will generally only perform a use case a few times per process lifetime. However, the more times a code path is run during the process lifetime, the more likely it'll execute faster because it's cached. Thus, if we want to measure a use case in a way that is similar to what users experience, we must measure the first time an interaction occurs during the process. In practice this means after you execute your use case once, force-stop the app before executing it again | ## Run the use case several times (maybe 10 times if it's quick, 5 if it's slow). You probably want to measure "cold" performance: we assume users will generally only perform a use case a few times per process lifetime. However, the more times a code path is run during the process lifetime, the more likely it'll execute faster because it's cached. Thus, if we want to measure a use case in a way that is similar to what users experience, we must measure the first time an interaction occurs during the process. In practice this means after you execute your use case once, force-stop the app before executing it again | ||
## Capture the results from logcat | ## Capture the results from logcat. If you log, "average <number-in-ms>", you can use [https://github.com/mozilla-mobile/perf-tools/blob/791b290ebe96b9f62a0dcca458fb610ad1f01d5f/analyze_durations.py the following script] to capture all the results and find the median <code>adb logcat -d > logcat && python3 perf-tools/analyze_durations.py logcat</code> | ||
# Compare the results, generally by comparing the ''median'' of the two runs | # Compare the results, generally by comparing the ''median'' of the two runs | ||
=== Example: | === Example: page load === | ||
TODO... the duration for a page load is a non-UI use case that is more complex than the very simple example provided above | |||
=== Example: very simple UI measurements === | |||
TODO... if the screen content is drawn synchronously, you can do something like: | |||
<syntaxhighlight lang="kotlin"> | |||
view.doOnPreDraw { | |||
end = SystemClock.elapsedRealtime() | |||
// Be sure to verify that this draw call is the draw call where the UI is visually complete | |||
// e.g. post to the front of the main thread queue and Thread.sleep(5000) and check the device | |||
} | |||
</syntaxhighlight> | |||
== Profile == | == Profile == | ||
TODO | TODO | ||