|
|
| (26 intermediate revisions by the same user not shown) |
| Line 27: |
Line 27: |
| === Telemetry vs microbenchmarks === | | === Telemetry vs microbenchmarks === |
|
| |
|
| There has been some discussion about using telemetry instead of xpcshell-based microbenchmarks. Current thinking is that they are complementary: Telemetry is real-life browsing patterns on real-life platforms and environments, whereas microbenchmarks are artificial browsing patterns running on a machine in a lab-environment. It is impractical to experiment with code-changes using telemetry to measure the effect - a benchmark in the lab is much more practical for this. On the other hand, telemetry is the (only?) way to ensure that improvements also are valid in real-life. Moreover, with telemetry it is very difficult (if not impossible) to get the context of your measurements. For example, suppose you measure the time to evict a cache-entry from disk-cache; for this measurement to make sense you also need to know the number of entries in the cache and the total size of the cache. This context-information is hard to get using telemetry alone. | | There has been some discussion about using telemetry instead of xpcshell-based microbenchmarks. Current thinking is that they are complementary and that telemetry is real-life browsing patterns on real-life platforms and environments, whereas microbenchmarks are artificial browsing patterns running on a controlled platform in a lab-environment. Some points: |
|
| |
|
| We plan to use telemetry as described below.
| | * It is impractical to experiment with code-changes using telemetry to measure the effect - a benchmark in the lab is much more practical for this. |
|
| |
|
| ==== Identify areas of interest ==== | | * Using telemetry it is very difficult (if not impossible) to get the context of your measurements. For example, suppose you measure the time to evict a cache-entry from disk-cache; for such a measurement to make sense you also need to know the number of entries in the cache and the total size of the cache. This context-information is hard to get using telemetry alone. |
| Telemetry will provide lots of data and a really important job is to read and analyze this. It is expected that we will see unexpected patterns from telemetry, and such patterns should trigger microbenchmarks to investigate specific areas. | | |
| | * On the other hand, telemetry mirrors the experience of real users and is the (only?) way to ensure that improvements are valid in real-life. |
| | |
| | Two very different factors determine cache-performance from a user-perspective: |
| | |
| | ; Performance of the platform (efficiency of CPU, memory- and disk-speed etc): We deal with this by implementing efficient datastructures, various file-schemes, multi-threading etc. Measuring this is the focus of microbenchmarks. |
| | ; Browsing patterns (how often do users revisit sites, how deep in sites do they browse etc): We deal with this by scaling caches and implementing smart replacement-algorithms. This factor is not taken into account by microbenchmarks - only telemetry handles this. |
| | |
| | Hence, current thinking is that telemetry and microbenchmarks are complementary mechanisms suitable for optimizing different factors of cache-performance. Microbenchmarks will be used as described in the section above and telemetry will be used for the following |
| | |
| | ==== Identify general areas of interest ==== |
| | Telemetry will provide lots of data and an important job is to read and analyze this. We expect to see unexpected patterns from telemetry, and such patterns should trigger the creation of microbenchmarks to investigate specific areas. |
|
| |
|
| ==== Tune parameters to make microbenchmarks more realistic ==== | | ==== Tune parameters to make microbenchmarks more realistic ==== |
| Original conditions in the lab are bound to be different from what users see in real life. However, in the lab we can control a number of parameters; by knowing realistic values for these parameters we can run tests in the lab under similar conditions as users, making our tests more realistic.
| | Conditions in the lab are bound to be different from those seen by real users. However, in the lab we control most parameters and by knowing realistic values for these parameters we can tune our lab-setup to similar conditions, making our lab-testing more realistic. We can use telemetry to find such values. Examples of this include |
| | |
| Examples of this include | |
|
| |
|
| * Bandwidth: What's the typical bandwidth(s) seen by a user? In the lab we normally load resources from the local host... | | * Bandwidth: What's the typical bandwidth(s) seen by a user? In the lab we normally load resources from the local host... |
| Line 43: |
Line 52: |
| * Cache-sizes: What's the typical cache-size out there? | | * Cache-sizes: What's the typical cache-size out there? |
|
| |
|
| | ==== Real-life verification of results from the lab ==== |
| | Telemetry monitors real-life efficiency of the cache, hence by monitoring the right values we can ensure that improvements we see in the lab also benefit real users. Exactly which measurements we need is not entirely clear yet (work is in progress to determine this). |
|
| |
|
| ==== Real-life verification of results from the lab ====
| | An important point is that in order to verify results like this we must measure the same (or at least very similar) values in microbenchmarks and in telemetry. We should also, of-course, measure other values from telemetry in order to cross-check our results, but in order to verify lab-improvements in real life we must align measurements. The rationale for this is as follows: Let's say we want to improve characteristic A. We make a benchmark to measure A, fix the code, and see a clear improvement of A in the lab so we push the code-fix. Let's say telemetry do ''not'' measure A but rather some other characteristic B. What if B didn't improve (or even degraded)? Is it because A and B are unrelated? Or is it because A actually didn't improve in real-life at all? We must first verify that A actually ''did'' improve in real-life, ''then'' we can discuss why B did ''not'' improve, and finally we can decide whether the code-change is worth keeping. Such results will also increase our understanding of how the caching works in general. |
| One way to define network-performance on a given platform is as the product of two factors: The browsing pattern (i.e. which urls are loaded in which sequence), and what exactly is measured. As discussed above, microbenchmarks and telemetry are inherently different with respect to the browsing pattern, but we can do our best to align telemetry and microbenchmarks wrt the second factor. Put in a different way: We should try to use the same code in telemetry and microbenchmarks to capture data, and we should ensure we interpret this data in the same way.
| | |
| | Thus, the suggested strategy is to first introduce a telemetry-probe to gather baseline-data for some characteristic we want to improve, then introduce a code-fix which improves the characteristic in the lab, then finally monitor changes from telemetry to verify the improvement we saw in the lab. In the last step, the particular characteristic should be monitored as well as other (more general) performance-probes for cross-checking. |
| | |
| | == The benchmarks == |
| | Each benchmark below is described and explained, and we also show output results from selected platforms. Note that results are provided as examples and information only; the benchmarks are meant for evaluating effect of code-changes, i.e. you run the benchmarks without and with your change and compare the results to evaluate the effect of your code. |
| | |
| | === test_stresstest_cache_with_timing.js === |
| | |
| | This test loads a number of '''different''' resources with uniform size. All responses are 200 Ok, i.e. there are no redirections. The resources are loaded sequentially in a tight loop and the test syncs with cache-io thread after finishing the loop. Each loop is executed for a number of different cache-configurations/setups: |
| | |
| | #no cache enabled, loading all urls from server, not writing to cache (clearing cache before this step) |
| | #empty memory-cache, loading all urls from server, write them to cache (clearing cache before this step) |
| | #memory-cache primed by pervious step, read all urls from cache again |
| | #empty disk-cache, loading all urls from server, write them to cache (clearing cache before this step) |
| | #disk-cache primed by pervious step, reading all urls from cache |
| | #memory-cache filled with small (128bytes) entries, loading all urls from server, write them to cache (clear and pre-load cache before this step) |
| | #memory-cache primed by pervious step, read all urls from cache |
| | #memory-cache filled with large (128Kb) entries, loading all urls from server, write them to cache (clear and pre-load cache before this step) |
| | #memory-cache primed by pervious step, read all urls from cache |
| | #disk-cache filled with small (128bytes) entries, loading all urls from server, write them to cache (clear and pre-load cache before this step) |
| | #disk-cache primed by pervious step, read all urls from cache |
| | #disk-cache filled with large (128Kb) entries, loading all urls from server, write them to cache (clear and pre-load cache before this step) |
| | #disk-cache primed by pervious step, read all urls from cache |
| | |
| | This whole procedure is repeated over a given list of datasizes, clearing the cache between each repetition. |
|
| |
|
| The major benefit of this is to have telemetry give us real-life verification '''after''' using synthetic, isolated and focused benchmarks in the lab. I.e. we can use synthetic test-patterns in the lab to identify and qualify code-changes, then after landing code-changes we should be able to see predictable effects of these changes via telemetry. If we measure performance differently in microbenchmarks and telemetry we may quickly end up "comparing apples and oranges", confusing ourselves.
| | Time is measured in two ways: 1) for each individual load by by using the nsITimingChannel interface of the channel, and 2) by measuring run-time of the complete loop from the test itself. |
|
| |
|
| Below is a pro/con list for using telemetry-code vs JS time-functions to capture data for microbenchmarks - feel free to add and comment.
| | Informally one can say that this test triggers loading a large number of uniformly sized resources at the same time, somewhat similar to what happens when starting up firefox with lots of open tabs. Time to finish loading all resources is measured over combinations of memory-cache vs disk-cache, empty cache vs full cache, cache containing small entries vs large entries, and all items found in cache vs no item found in cache. |
|
| |
|
| {| border="1" cellpadding="5" cellspacing="0" align="center"
| | ==== Generated datafiles ==== |
| |+'''Measure cache-performance using JS vs using telemetry-code'''
| |
| |-
| |
| !|
| |
| !| Time-function in JavaScript
| |
| !| Telemetry timestamps compiled into the browser
| |
| |-
| |
| !| Pros
| |
| ||
| |
| * Add and experiment with measurements without recompiling
| |
| ||
| |
| * Avoid JS and JS-thread jitter
| |
| * Better granularity than JS-functions
| |
| * Verify changes using results from real users - results are identical to those used in the lab (i.e. we avoid comparing apples and oranges)
| |
| |-
| |
| !| Cons
| |
| ||
| |
| * Synthetic tests only, no way to '''directly''' verify effects in real life
| |
| ||
| |
| * Increased code-size (<i>also in releases??</i>)
| |
| * Must have an idea up-front about timing-profile (histograms)
| |
| * For timings which vary a lot we may get a large histogram and thus use a lot of data
| |
| |-
| |
| |}
| |
|
| |
|
| == The benchmarks (Obsolete: Will update with relevant information asap) ==
| | The test writes data into a number of files. '''Note that''' all files are appended, meaning that if they exist prior to the run, you may have unexpected results... |
| Each benchmark below is described and explained, and we also show output results from selected platforms. Note that results are provided as examples and information only; the benchmarks are meant for evaluating effect of code-changes, i.e. you run the benchmarks without and with your change and compare the results to evaluate the effect of your code.
| |
|
| |
|
| === test_timing_cache.js ===
| | ;summary.dat |
| This benchmark measures time to load a resource from a server and from the cache, as well as time for call-overhead and clearing the cache.
| | :Format is multi-data with one datablock for each datasize run by the test. Each datablock has a header describing the datasize, then one row for each of the configurations described above, summing up the results from the loop. Datapoints are |
|
| |
|
| The approach is to repeat a single load in a loop for some time and report the average time for one iteration. This eliminates timer-granularity issues and should smoothen out random jitter in the system, but the weakness is that IO-caching at any level will influence disk-cache results. Moreover, memory-cache results are likely to be influenced by HW caches and memory-paging in the OS. In short, this benchmark does not implement a very realistic access-pattern for cache-entries, hence its practical value may be limited. It can be useful, though, to study changes unrelated to the speed of the cache media, e.g. code to optimize datastructures or algorithms.
| | #description of the configuration |
| | #minimum value from the loop |
| | #maximum value from the loop |
| | #calculated mean over the results from the loop |
| | #standard deviation over all data from the loop |
| | #number of iterations in the loop |
| | #time measured by the test using Data.now() for the loop |
| | #time measured by the test using Data.now() for the loop, including syncing with the cache-io thread at the end |
|
| |
|
| Note that there are two different tests for cache-misses: One which clears the cache in each iteration, and one where the response has a header which prevents it from being cached. (The first is mainly present for completeness - it makes a clearCache() call in each iteration, which is not very likely to happen in real life.) One could also imagine a third approach to emulate cache-misses; suffix the url with a different query-string in order to load a new resource in each iteration. However, this doesn't work because it hits the max number of open connections and times out. (See test below though, where we can control this limit in a different manner.)
| | ;timingchannel-<bytes>.dat |
| | :One file for each datasize run by the test. The format is multi-data with one datablock for each configuration. Each row represents one resource-load in actual order. The datapoints in each row are |
|
| |
|
| All operations are reported for different data-sizes, as well as with no cache enabled, memory-cache enabled only and disk-cache enabled only. (Note that, theoretically, searching for an entry in a cache should take some time, hence we measure also with no cache enabled.)
| | #timestamp of create |
| | #time from create to asyncOpen |
| | #time from asyncOpen to cacheEntryAvailable |
| | #time from cacheEntryAvailable to cacheReadStart |
| | #time from cacheReadStart to cacheReadEnd |
|
| |
|
| Some results from a Linux server running 64-bit Ubuntu 10.4
| | alternatively (if loaded from channel, not from cache) |
|
| |
|
| Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
| | #timestamp of create |
| Max #connections=256 / max per server=256
| | #time from create to asyncOpen |
| Total time #runs avg time/run
| | #time from asyncOpen to cacheEntryAvailable |
| === Set datasize 50 bytes
| | #time from cacheEntryAvailable to responseStart |
| Calloverhead (no-cache) 10016 14594 0.69
| | #time from responseStart to responseEnd |
| Calloverhead+ClearCache (no-cache) 10015 11647 0.86
| |
| CacheMiss (nocache) (no cache) 10047 368 27.30
| |
| Calloverhead (mem-cache) 10015 14532 0.69
| |
| Calloverhead+ClearCache (mem-cache) 10016 11616 0.86
| |
| CacheMiss (clrcache) (mem-cache) 10018 360 27.83
| |
| CacheMiss (nocache) (mem-cache) 10032 367 27.34
| |
| CacheHit (mem-cache) 10017 4674 2.14
| |
| Calloverhead (disk-cache) 10015 13954 0.72
| |
| Calloverhead+ClearCache (disk-cache) 10017 6723 1.49
| |
| CacheMiss (clrcache) (disk-cache) 10033 346 29.00
| |
| CacheMiss (nocache) (disk-cache) 10032 357 28.10
| |
| CacheHit (disk-cache) 10017 4709 2.13
| |
| === Set datasize 1024 bytes
| |
| CacheMiss (nocache) (no cache) 10020 339 29.56
| |
| CacheMiss (nocache) (mem-cache) 10037 351 28.60
| |
| CacheHit (mem-cache) 10019 3634 2.76
| |
| CacheMiss (nocache) (disk-cache) 10048 345 29.12
| |
| CacheHit (disk-cache) 10018 3535 2.83
| |
| === Set datasize 51200 bytes
| |
| CacheMiss (nocache) (no cache) 10095 103 98.01
| |
| CacheMiss (nocache) (mem-cache) 10023 104 96.38
| |
| CacheHit (mem-cache) 10070 306 32.91
| |
| CacheMiss (nocache) (disk-cache) 10110 104 97.21
| |
| CacheHit (disk-cache) 10054 301 33.40
| |
| === Set datasize 524288 bytes
| |
| CacheMiss (nocache) (no cache) 10277 14 734.07
| |
| CacheMiss (nocache) (mem-cache) 10312 14 736.57
| |
| CacheHit (mem-cache) 10443 31 336.87
| |
| CacheMiss (nocache) (disk-cache) 10283 14 734.50
| |
| CacheHit (disk-cache) 10451 31 337.13
| |
|
| |
|
| The important numbers are found in the rightmost column. Note that the first block (datasize 50 bytes) includes more results than the other blocks. "Pure Overhead" measures the call-overhead from JavaScript, and "Overhead+ClearCache" measures the time to Clear the cache from JavaScript. These numbers are unrelated to the size of the data loaded, hence reported only once.
| | Referring to Telemetry-data accumulated by nsLoadGroup, (2+3) corresponds to Telemetry::HTTP_PAGE_OPEN_TO_FIRST_FROM_CACHE alternatively Telemetry::HTTP_PAGE_OPEN_TO_FIRST_RECEIVED, and (2+3+4) corresponds to Telemetry::HTTP_PAGE_COMPLETE_LOAD. |
|
| |
|
| Results above indicate that reading from the disk-cache is about as fast as reading from memory-cache, which probably is due to efficient IO-caching on this platform. Note also that cache-hits for small entries are relatively much faster than cache-hits for larger entries (i.e. for small entries there is a factor 10 whereas for large entries we see a factor 2-3).
| | ==== Files to work with GnuPlot (requires Gnuplot >= 4.4) ==== |
|
| |
|
| There is no conclusive evidence that searching a disk or memory cache takes any significant amount of time. However, the caches in this test are pretty much empty so we should create a separate test for this, making sure the cache contains some number of entries before we measure. (See [[#Wanted_tests]] below)
| | ;settings.gnu |
| | :setup for the histogram, title of plot etc. Edit this to describe your platform. |
|
| |
|
| Some raw results from my Nexus S. '''I find it weird that this device outperforms the Linux server when the data-size becomes larger... any insight is appreciated'''
| | ;plot-summary.gnu |
| | :Plots mean-values from summary.dat in one histogram, grouped by first column. Standard deviation in the dataset is indicated as an errorline. This is the recommended starting-point for studying results from a test-run. |
|
| |
|
| Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
| | ;plot-minmax-summary.gnu |
| Max #connections=256 / max per server=256
| | :Like above but indicates min and max values from the dataset instead of stddev. |
| Total time #runs avg time/run
| |
| === Set datasize 50 bytes
| |
| Calloverhead (no-cache) 10016 10878 0.92
| |
| Calloverhead+ClearCache (no-cache) 10016 9482 1.06
| |
| CacheMiss (nocache) (no cache) 10020 321 31.21
| |
| Calloverhead (mem-cache) 10015 10975 0.91
| |
| Calloverhead+ClearCache (mem-cache) 10016 8937 1.12
| |
| CacheMiss (clrcache) (mem-cache) 10036 371 27.05
| |
| CacheMiss (nocache) (mem-cache) 10021 348 28.80
| |
| CacheHit (mem-cache) 10016 4639 2.16
| |
| Calloverhead (disk-cache) 10016 11426 0.88
| |
| Calloverhead+ClearCache (disk-cache) 10043 213 47.15
| |
| CacheMiss (clrcache) (disk-cache) 10034 117 85.76
| |
| CacheMiss (nocache) (disk-cache) 10019 347 28.87
| |
| CacheHit (disk-cache) 10017 4206 2.38
| |
| === Set datasize 1024 bytes
| |
| CacheMiss (nocache) (no cache) 10049 332 30.27
| |
| CacheMiss (nocache) (mem-cache) 10024 339 29.57
| |
| CacheHit (mem-cache) 10017 3501 2.86
| |
| CacheMiss (nocache) (disk-cache) 10031 327 30.68
| |
| CacheHit (disk-cache) 10017 3308 3.03
| |
| === Set datasize 51200 bytes
| |
| CacheMiss (nocache) (no cache) 10064 119 84.57
| |
| CacheMiss (nocache) (mem-cache) 10074 119 84.66
| |
| CacheHit (mem-cache) 10049 396 25.38
| |
| CacheMiss (nocache) (disk-cache) 10078 116 86.88
| |
| CacheHit (disk-cache) 10031 426 23.55
| |
| === Set datasize 524288 bytes
| |
| CacheMiss (nocache) (no cache) 10572 19 556.42
| |
| CacheMiss (nocache) (mem-cache) 10052 18 558.44
| |
| CacheHit (mem-cache) 10274 45 228.31
| |
| CacheMiss (nocache) (disk-cache) 10167 18 564.83
| |
| CacheHit (disk-cache) 10177 43 236.67
| |
|
| |
|
| This is one way to represent the results
| | ;plot-summary-per-size.gnu and plot-minmax-summary-per-size.gnu |
| | :Plots summaries like the two plots above, but makes one plot for each datasize to avoid scaling-issues in the plot. |
|
| |
|
| {| border="1" cellpadding="5" cellspacing="0" align="center"
| | ;plot-summary-from-javascript.gnu |
| |+'''Summary of results over different platforms'''
| | :Plots summaries of average loading-times seen from JS. The term "INCLUDING sync" means that the time includes time to sync with the cache-io thread after finishing a loop. |
| |-
| |
| ! colspan="2" |
| |
| ! colspan="4" | No cache
| |
| ! colspan="4" | Memory-Cache
| |
| ! colspan="4" | Disk-Cache
| |
| |-
| |
| ! colspan="2" | || 50 || 1024 || 51200 || 524288 || 50 || 1024 || 51200 || 524288 || 50 || 1024 || 51200 || 524288
| |
| |-
| |
| ! rowspan="3" | Ubuntu 64bit server
| |
| ! | Clear cache
| |
| | colspan="4"| 0.86
| |
| | colspan="4"| 0.86
| |
| | colspan="4"| 1.49
| |
| |-
| |
| ! | Miss
| |
| | 27.30 || 29.56 || 98.01 || 734.07
| |
| | 27.34 || 28.60 || 96.38 || 736.57
| |
| | 28.10 || 29.12 || 97.21 || 734.50
| |
| |-
| |
| ! | Hit
| |
| | colspan="4" | '''n/a'''
| |
| | 2.14 || 2.76 || 32.91 || 336.87
| |
| | 2.13 || 2.83 || 33.40 || 337.13
| |
| |-
| |
| ! rowspan="3" | Nexus S
| |
| ! | Clear cache
| |
| | colspan="4"| 1.06
| |
| | colspan="4"| 1.12
| |
| | colspan="4"| 47.15
| |
| |-
| |
| ! | Miss
| |
| | 31.21 || 30.27 || 84.57 || 556.42
| |
| | 28.80 || 29.57 || 84.66 || 558.44
| |
| | 28.87 || 30.68 || 86.88 || 564.83
| |
| |-
| |
| ! | Hit
| |
| | colspan="4" | '''n/a'''
| |
| | 2.16 || 2.86 || 25.38 || 228.31
| |
| | 2.38 || 3.03 || 23.55 || 236.67
| |
| |-
| |
| ! rowspan="3" | Samsung Galaxy S (from Geoff)
| |
| ! | Clear cache
| |
| | colspan="4"| 1.89
| |
| | colspan="4"| 1.87
| |
| | colspan="4"| 1339.75
| |
| |-
| |
| ! | Miss
| |
| | 31.33 || 27.48 || 74.04 || 524.20
| |
| | 27.27 || 28.32 || 75.05 || 529.32
| |
| | 28.04 || 28.43 || 93.92 || 612.59
| |
| |-
| |
| ! | Hit
| |
| | colspan="4" | '''n/a'''
| |
| | 2.67 || 3.04 || 21.67 || 209.69
| |
| | 3.03 || 3.37 || 24.37 || 218.61
| |
| |}
| |
|
| |
|
| | ;plot-timingchannel.gnu |
| | :You need to copy the file "timingchannel-<size>.dat" to "timingchannel.dat" for the size you want to study. Plots detailed information for each resource loaded, one plot for each configuration. The format used is a rowstacked histogram where each column shows total time from creating the channel to loading has finished. '''Note''' that columns show time for each load separately (all times starting at 0) as opposed to a real time-line for the dataset. |
|
| |
|
| and this is another representation. I prefer the latter since it allows me to easier study the effect of different cache-configurations on a particular testcase
| | ==== Sample results (debug-builds with logging enabled) ==== |
|
| |
|
| {| border="1" cellpadding="5" cellspacing="0" align="center"
| | This benchmark has currently been run successfully on two different Linux-systems and on a Nexus S. Below are extracts from the results, focusing on time to load resources as seen from JavaScript. The plots have been obtained by editing "summary.dat", sorting results by cache-type and splitting up in "hits" and "misses" which are plotted separately. Modifying the generated files and the GnuPlot-files to do this is trivial and I won't discuss it here. |
| |+'''Summary 2 of results over different platforms'''
| |
| |-
| |
| ! colspan="2" | || 50 || 1024 || 51200 || 524288
| |
| |-
| |
| ! rowspan="5" | Ubuntu 64bit server
| |
| ! align="right" | Miss (no-cache)
| |
| | 27.30 || 29.56 || 98.01 || 734.07
| |
| |-
| |
| | align="right" | (mem-cache)
| |
| | 27.34 || 28.60 || 96.38 || 736.57
| |
| |-
| |
| | align="right" | (disk-cache)
| |
| | 28.10 || 29.12 || 97.21 || 734.50
| |
| |-
| |
| ! | Hit (mem-cache)
| |
| | 2.14 || 2.76 || 32.91 || 336.87
| |
| |-
| |
| | align="right" | (disk-cache)
| |
| | 2.13 || 2.83 || 33.40 || 337.13
| |
| |-
| |
| ! rowspan="5" | Nexus S
| |
| ! align="right" | Miss (no-cache)
| |
| | 31.21 || 30.27 || 84.57 || 556.42
| |
| |-
| |
| | align="right" | (mem-cache)
| |
| | 28.80 || 29.57 || 84.66 || 558.44
| |
| |-
| |
| | align="right" | (disk-cache)
| |
| | 28.87 || 30.68 || 86.88 || 564.83
| |
| |-
| |
| ! align="right" | Hit (mem-cache)
| |
| | 2.16 || 2.86 || 25.38 || 228.31
| |
| |-
| |
| | align="right" | (disk-cache)
| |
| | 2.38 || 3.03 || 23.55 || 236.67
| |
| |-
| |
| ! rowspan="5" | Samsung Galaxy S (from Geoff)
| |
| ! align="right" | Miss (no-cache)
| |
| | 31.33 || 27.48 || 74.04 || 524.20
| |
| |-
| |
| | align="right" | (mem-cache)
| |
| | 27.27 || 28.32 || 75.05 || 529.32
| |
| |-
| |
| | align="right" | (disk-cache)
| |
| | 28.04 || 28.43 || 93.92 || 612.59
| |
| |-
| |
| ! align="right" | Hit (mem-cache)
| |
| | 2.67 || 3.04 || 21.67 || 209.69
| |
| |-
| |
| | align="right" | (disk-cache)
| |
| | 3.03 || 3.37 || 24.37 || 218.61
| |
| |}
| |
|
| |
|
| === test_timing_cache_2.js ===
| | The first set of graphs show "misses", i.e. it indicates time to search the cache for each resource, clear space for it, load it and finally set it up to be saved. Resources are served from the in-built httpd.js - loading with cache disabled is included first as a reference. Hence, the first group in the histogram show resources loaded without caching. The next three groups show results with disk-cache enabled and the last three groups show results for the memory-cache. |
| This benchmark loads a number of resources with different cache-keys and then waits for all cache-io to finish. Then it loads all resources again (retrieving them from cache this time). Both sequences are measured and reported for various entry-sizes and cache-configurations.
| |
|
| |
|
| This access-pattern is slightly more realistic than in the benchmark described above; a page is likely to load some number of resources from the (same) server, and if the user visits the page second time, those resources are loaded again (from cache this time). The benchmark aims at emulating this pattern in a simple way.
| | {| |
| | | [[File:Watson-summary-misses-100iter.png|400px]] |
| | | [[File:HPEnvy-summary-misses-100iter.png|400px]] |
| | | [[File:Nexuss-summary-misses-100iter.png|400px]] |
| | |} |
|
| |
|
| Some results from a Linux server running 64-bit Ubuntu 10.4
| |
|
| |
|
| Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
| | Note that a memory-cache full of small entries (sixth group) seems to perform bad for large resources (surprising, because you'd expect the disk-cache to always be slower). |
| Max #connections=256 / max per server=256
| |
| Total time #runs avg time/run
| |
| Setting datasize 50 bytes
| |
| CacheMiss (nocache) (no cache) 2803 101 27.75
| |
| CacheMiss (q-url) (mem-cache) 2767 101 27.40
| |
| CacheHit (mem-cache) 205 101 2.03
| |
| CacheMiss (q-url) (disk-cache) 2783 101 27.55
| |
| CacheHit (disk-cache) 186 101 1.84
| |
| Setting datasize 1024 bytes
| |
| CacheMiss (nocache) (no cache) 2884 101 28.55
| |
| CacheMiss (q-url) (mem-cache) 2570 101 25.45
| |
| CacheHit (mem-cache) 269 101 2.66
| |
| CacheMiss (q-url) (disk-cache) 2910 101 28.81
| |
| CacheHit (disk-cache) 271 101 2.68
| |
| Setting datasize 51200 bytes
| |
| CacheMiss (nocache) (no cache) 9546 101 94.51
| |
| CacheMiss (q-url) (mem-cache) 9536 101 94.42
| |
| CacheHit (mem-cache) 3274 101 32.42
| |
| CacheMiss (q-url) (disk-cache) 9543 101 94.49
| |
| CacheHit (disk-cache) 3274 101 32.42
| |
| Setting datasize 524288 bytes
| |
| CacheMiss (nocache) (no cache) 75942 101 751.90
| |
| CacheMiss (q-url) (mem-cache) 75840 101 750.89
| |
| CacheHit (mem-cache) 27799 101 275.24
| |
| CacheMiss (q-url) (disk-cache) 79318 101 785.33
| |
| CacheHit (disk-cache) 31333 101 310.23
| |
|
| |
|
| The interesting numbers are found in the rightmost column of the output.
| | Note that a disk-cache full of small entries (third group) is generally slower than the empty disk-cache (second group). |
|
| |
|
| Results above suggest that loading entries up to a certain size from the server (the xpcshell-handler in this case) is rather consistent regardless of whether we use no cache, mem-cache or a disk-cache. Also, when loading entries up to a certain size from cache, the disk-cache seems to be as fast as the memory-cache. This IMO indicates that IO on this platform is pretty efficient (up to a certain limit) because we would always expect to spend more time when using the disk-cache than when using mem-cache. Observe that we need 512K-sized resources to see a clear difference between using disk and memory. (The exact limit here is not identified, but it may not very interesting either since it is probably platform-specific).
| |
|
| |
|
| Also observe that when reading from the cache, small entries load relatively much faster than larger entries. (Memory-page size?)
| | The second set of graphs show "hits", i.e. the time spent to load the resource from cache. It indicates time to search for the resource as well as load it. Columns are sorted as in previous set of graphs. |
|
| |
|
| Results from my Nexus S. '''Like above, the performance on large data is surprisingly good compared to the Linux server... any insight?'''
| | {| |
| | | [[File:Watson-summary-hits.png|400px]] |
| | | [[File:HPEnvy-summary-hits.png|400px]] |
| | | [[File:Nexuss-summary-hits.png|400px]] |
| | |} |
|
| |
|
| Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
| | Note the time to load 2K and 8K entries from a memory-cache filled with small entries (sixth group)... Could this be caused by the access-pattern or is it generally true? |
| Max #connections=256 / max per server=256
| |
| Total time #runs avg time/run
| |
| Setting datasize 50 bytes
| |
| CacheMiss (nocache) (no cache) 2724 101 26.97
| |
| CacheMiss (q-url) (mem-cache) 2249 101 22.27
| |
| CacheHit (mem-cache) 126 101 1.25
| |
| CacheMiss (q-url) (disk-cache) 2314 101 22.91
| |
| CacheHit (disk-cache) 146 101 1.45
| |
| Setting datasize 1024 bytes
| |
| CacheMiss (nocache) (no cache) 2190 101 21.68
| |
| CacheMiss (q-url) (mem-cache) 2206 101 21.84
| |
| CacheHit (mem-cache) 177 101 1.75
| |
| CacheMiss (q-url) (disk-cache) 2382 101 23.58
| |
| CacheHit (disk-cache) 189 101 1.87
| |
| Setting datasize 51200 bytes
| |
| CacheMiss (nocache) (no cache) 7388 101 73.15
| |
| CacheMiss (q-url) (mem-cache) 7582 101 75.07
| |
| CacheHit (mem-cache) 2205 101 21.83
| |
| CacheMiss (q-url) (disk-cache) 7749 101 76.72
| |
| CacheHit (disk-cache) 2226 101 22.04
| |
| Setting datasize 524288 bytes
| |
| CacheMiss (nocache) (no cache) 57251 101 566.84
| |
| CacheMiss (q-url) (mem-cache) 57939 101 573.65
| |
| CacheHit (mem-cache) 22888 101 226.61
| |
| CacheMiss (q-url) (disk-cache) 62850 101 622.28
| |
| CacheHit (disk-cache) 22626 101 224.02
| |
|
| |
|
| == Wanted tests == | | == Wanted tests == |