Necko/MobileCache/MicroBenchmarks: Difference between revisions

Line 27: Line 27:
=== Telemetry vs microbenchmarks ===
=== Telemetry vs microbenchmarks ===


There has been some discussion about using xpcshell-based microbenchmarks instead of telemetry. IMO these approaches are complementary: Telemetry is real-life browsing patterns on real-life platforms and environments, whereas microbenchmarks are artificial browsing patterns running on a machine in a lab-environment. These are complementary because IMO it is impractical to experiment with code-changes using telemetry to measure the effect - a benchmark in the lab is much more practical for this. On the other hand, telemetry is the (only?) way to ensure that improvements also are valid in real-life.
There has been some discussion about using telemetry instead of xpcshell-based microbenchmarks. Current thinking is that they are complementary: Telemetry is real-life browsing patterns on real-life platforms and environments, whereas microbenchmarks are artificial browsing patterns running on a machine in a lab-environment. It is impractical to experiment with code-changes using telemetry to measure the effect - a benchmark in the lab is much more practical for this. On the other hand, telemetry is the (only?) way to ensure that improvements also are valid in real-life. Moreover, with telemetry it is very difficult (if not impossible) to get the context of your measurements. For example, suppose you measure the time to evict a cache-entry from disk-cache; for this measurement to make sense you also need to know the number of entries in the cache and the total size of the cache. This context-information is hard to get using telemetry alone.


IMO performance in necko on a given platform is the product of two factors: The browsing pattern (i.e. which urls are loaded in which sequence), and what exactly is measured. As discussed in the previous paragraph, microbenchmarks and telemetry are inherently different with respect to the browsing pattern, but we can align telemetry and microbenchmarks wrt the second factor. Put in a different way: We should try to use the same code in telemetry and microbenchmarks to capture data used to measure performance, and we should ensure we interpret this data in the same way.
We plan to use telemetry as described below.


The major benefit of this is to have telemetry give us real-life verification '''after''' using synthetic, isolated and focused benchmarks in the lab. I.e. we can use synthetic test-patterns implemented by xpcshell-tests (the microbenchmarks) in the lab to identify and qualify code-changes, then after landing code-changes we should be able to see predictable effects of these changes on real-life usage-patterns via telemetry. If we measure performance differently in microbenchmarks and telemetry we may quickly end up "comparing apples and oranges".
==== Identify areas of interest ====
Telemetry will provide lots of data and a really important job is to read and analyze this. It is expected that we will see unexpected patterns from telemetry, and such patterns should trigger microbenchmarks to investigate specific areas.
 
==== Tune parameters to make microbenchmarks more realistic ====
Original conditions in the lab are bound to be different from what users see in real life. However, in the lab we can control a number of parameters; by knowing realistic values for these parameters we can run tests in the lab under similar conditions as users, making our tests more realistic.
 
Examples of this include
 
* Bandwidth: What's the typical bandwidth(s) seen by a user? In the lab we normally load resources from the local host...
* Latency/RTT: How long does it typically take from sending a request to the server before the response starts to appear?
* Cache-sizes: What's the typical cache-size out there?
 
 
==== Real-life verification of results from the lab ====
One way to define network-performance on a given platform is as the product of two factors: The browsing pattern (i.e. which urls are loaded in which sequence), and what exactly is measured. As discussed above, microbenchmarks and telemetry are inherently different with respect to the browsing pattern, but we can do our best to align telemetry and microbenchmarks wrt the second factor. Put in a different way: We should try to use the same code in telemetry and microbenchmarks to capture data, and we should ensure we interpret this data in the same way.
 
The major benefit of this is to have telemetry give us real-life verification '''after''' using synthetic, isolated and focused benchmarks in the lab. I.e. we can use synthetic test-patterns in the lab to identify and qualify code-changes, then after landing code-changes we should be able to see predictable effects of these changes via telemetry. If we measure performance differently in microbenchmarks and telemetry we may quickly end up "comparing apples and oranges", confusing ourselves.


Below is a pro/con list for using telemetry-code vs JS time-functions to capture data for microbenchmarks - feel free to add and comment.
Below is a pro/con list for using telemetry-code vs JS time-functions to capture data for microbenchmarks - feel free to add and comment.
97

edits