Necko/MobileCache/MicroBenchmarks: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 16: Line 16:
Some results from a Linux server running 64-bit Ubuntu 10.4
Some results from a Linux server running 64-bit Ubuntu 10.4


Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                         Total time      #runs  avg time/run
                                         Total time      #runs  avg time/run
  === Set datasize 50 bytes
  === Set datasize 50 bytes
  Pure overhead        (no-cache)         10015           14902   0.67
  Calloverhead            (no-cache)     10016           14594   0.69
  Overhead+ClearCache (no-cache)         10016           11703   0.86
  Calloverhead+ClearCache (no-cache)     10015           11647   0.86
  Load                (no cache)         10041           362     27.74
  CacheMiss (nocache)    (no cache)     10047           368     27.30
  Pure overhead        (mem-cache)       10016           14799   0.68
  Calloverhead            (mem-cache)     10015           14532   0.69
  Overhead+ClearCache (mem-cache)       10016          11663   0.86
  Calloverhead+ClearCache (mem-cache)     10016          11616   0.86
  Cache hit            (mem-cache)       10018          4985    2.01
  CacheMiss (clrcache)    (mem-cache)     10018          360    27.83
  Cache miss/clrcache  (mem-cache)       10038           347     28.93
  CacheMiss (nocache)    (mem-cache)     10032           367     27.34
  Cache miss/nocache  (mem-cache)       10046           365    27.52
  CacheHit                (mem-cache)     10017           4674    2.14
  Pure overhead        (disk-cache)       10015          14832   0.68
  Calloverhead            (disk-cache)   10015          13954   0.72
  Overhead+ClearCache (disk-cache)       10015           6674   1.50
  Calloverhead+ClearCache (disk-cache)   10017           6723   1.49
  Cache hit            (disk-cache)       10017           4973    2.01
  CacheMiss (clrcache)    (disk-cache)   10033           346    29.00
  Cache miss/clrcache  (disk-cache)       10019           343     29.21
  CacheMiss (nocache)    (disk-cache)   10032           357     28.10
  Cache miss/nocache  (disk-cache)       10029           363    27.63
  CacheHit                (disk-cache)   10017           4709    2.13
  === Set datasize 1024 bytes
  === Set datasize 1024 bytes
  Load                (no cache)         10020          351     28.55
  CacheMiss (nocache)    (no cache)     10020          339     29.56
  Cache hit            (mem-cache)       10019           3829    2.62
  CacheMiss (nocache)    (mem-cache)     10037           351    28.60
  Cache miss/nocache  (mem-cache)       10037           349    28.76
  CacheHit                (mem-cache)     10019           3634    2.76
  Cache hit          (disk-cache)       10018           3800    2.64
  CacheMiss (nocache)    (disk-cache)   10048           345    29.12
  Cache miss/nocache  (disk-cache)       10018          344    29.12
  CacheHit                (disk-cache)   10018          3535    2.83
  === Set datasize 51200 bytes
  === Set datasize 51200 bytes
  Load                (no cache)         10093           108     93.45
  CacheMiss (nocache)    (no cache)     10095           103     98.01
  Cache hit            (mem-cache)       10053           324     31.03
  CacheMiss (nocache)    (mem-cache)     10023           104     96.38
  Cache miss/nocache  (mem-cache)       10051           106     94.82
  CacheHit                (mem-cache)     10070           306     32.91
  Cache hit          (disk-cache)       10060           325     30.95
  CacheMiss (nocache)    (disk-cache)   10110           104     97.21
  Cache miss/nocache  (disk-cache)       10039           107     93.82
  CacheHit                (disk-cache)   10054           301     33.40
  === Set datasize 524288 bytes
  === Set datasize 524288 bytes
  Load                (no cache)         10557           15     703.80
  CacheMiss (nocache)    (no cache)     10277           14     734.07
  Cache hit            (mem-cache)       10404           33     315.27
  CacheMiss (nocache)    (mem-cache)     10312           14     736.57
  Cache miss/nocache  (mem-cache)       10577           15     705.13
  CacheHit                (mem-cache)     10443           31     336.87
  Cache hit          (disk-cache)       10354           33     313.76
  CacheMiss (nocache)    (disk-cache)   10283           14     734.50
  Cache miss/nocache  (disk-cache)       10604           15     706.93
  CacheHit                (disk-cache)   10451           31     337.13


The important numbers are found in the rightmost column. Note that the first block (datasize 50 bytes) includes more results than the other blocks. "Pure Overhead" measures the call-overhead from JavaScript, and "Overhead+ClearCache" measures the time to Clear the cache from JavaScript. These numbers are unrelated to the size of the data loaded, hence reported only once.
The important numbers are found in the rightmost column. Note that the first block (datasize 50 bytes) includes more results than the other blocks. "Pure Overhead" measures the call-overhead from JavaScript, and "Overhead+ClearCache" measures the time to Clear the cache from JavaScript. These numbers are unrelated to the size of the data loaded, hence reported only once.

Revision as of 13:43, 9 September 2011

This Is Work In Progress - Information Is Incomplete

This page describes the cache-related microbenchmarks created so far. Contact BjarneG (bherland@mozilla.com) for a patch which lets you run these benchmark using the "check-one" target of xpcshell tests.

Each benchmark below is described and explained, and we also show output results from selected platforms. Note that results are provided as examples and information only; the benchmarks are meant for evaluating effect of code-changes, i.e. you run the benchmarks without and with your change and compare the results to evaluate the effect of your code.

test_timing_cache.js

This benchmark measures time to load a resource from a server and from the cache, as well as time for call-overhead and clearing the cache.

The approach is to repeat a single load in a loop for some time and report the average time for one iteration. This eliminates timer-granularity issues and should smoothen out random jitter in the system, but the weakness is that IO-caching at any level will influence disk-cache results. Moreover, memory-cache results are likely to be influenced by HW caches and memory-paging in the OS. In short, this benchmark does not implement a very realistic access-pattern for cache-entries, hence its practical value may be limited. It can be useful, though, to study changes unrelated to the speed of the cache media, e.g. code to optimize datastructures or algorithms.

Note that there are two different tests for cache-misses: One which clears the cache in each iteration, and one where the response has a header which prevents it from being cached. (The first is mainly present for completeness - it makes a clearCache() call in each iteration, which is not very likely to happen in real life.) One could also imagine a third approach to emulate cache-misses; suffix the url with a different query-string in order to load a new resource in each iteration. However, this doesn't work because it hits the max number of open connections and times out. (See test below though, where we can control this limit in a different manner.)

All operations are reported for different data-sizes, as well as with no cache enabled, memory-cache enabled only and disk-cache enabled only. (Note that, theoretically, searching for an entry in a cache should take some time, hence we measure also with no cache enabled.)

Some results from a Linux server running 64-bit Ubuntu 10.4

Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                       Total time      #runs   avg time/run
=== Set datasize 50 bytes
Calloverhead            (no-cache)      10016           14594   0.69
Calloverhead+ClearCache (no-cache)      10015           11647   0.86
CacheMiss (nocache)     (no cache)      10047           368     27.30
Calloverhead            (mem-cache)     10015           14532   0.69
Calloverhead+ClearCache (mem-cache)     10016           11616   0.86
CacheMiss (clrcache)    (mem-cache)     10018           360     27.83
CacheMiss (nocache)     (mem-cache)     10032           367     27.34
CacheHit                (mem-cache)     10017           4674    2.14
Calloverhead            (disk-cache)    10015           13954   0.72
Calloverhead+ClearCache (disk-cache)    10017           6723    1.49
CacheMiss (clrcache)    (disk-cache)    10033           346     29.00
CacheMiss (nocache)     (disk-cache)    10032           357     28.10
CacheHit                (disk-cache)    10017           4709    2.13
=== Set datasize 1024 bytes
CacheMiss (nocache)     (no cache)      10020           339     29.56
CacheMiss (nocache)     (mem-cache)     10037           351     28.60
CacheHit                (mem-cache)     10019           3634    2.76
CacheMiss (nocache)     (disk-cache)    10048           345     29.12
CacheHit                (disk-cache)    10018           3535    2.83
=== Set datasize 51200 bytes
CacheMiss (nocache)     (no cache)      10095           103     98.01
CacheMiss (nocache)     (mem-cache)     10023           104     96.38
CacheHit                (mem-cache)     10070           306     32.91
CacheMiss (nocache)     (disk-cache)    10110           104     97.21
CacheHit                (disk-cache)    10054           301     33.40
=== Set datasize 524288 bytes
CacheMiss (nocache)     (no cache)      10277           14      734.07
CacheMiss (nocache)     (mem-cache)     10312           14      736.57
CacheHit                (mem-cache)     10443           31      336.87
CacheMiss (nocache)     (disk-cache)    10283           14      734.50
CacheHit                (disk-cache)    10451           31      337.13

The important numbers are found in the rightmost column. Note that the first block (datasize 50 bytes) includes more results than the other blocks. "Pure Overhead" measures the call-overhead from JavaScript, and "Overhead+ClearCache" measures the time to Clear the cache from JavaScript. These numbers are unrelated to the size of the data loaded, hence reported only once.

Results above indicate that reading from the disk-cache is about as fast as reading from memory-cache, which probably is due to efficient IO-caching on this platform. Note also that cache-hits for small entries are relatively much faster than cache-hits for larger entries (i.e. for small entries there is a factor 10 whereas for large entries we see a factor 2-3).

There is no conclusive evidence that searching a disk or memory cache takes any significant amount of time. However, the caches in this test are pretty much empty so we should create a separate test for this, making sure the cache contains some number of entries before we measure. (See #Wanted_tests below)

test_timing_cache_2.js

This benchmark loads a number of resources with different cache-keys and then waits for all cache-io to finish. Then it loads all resources again (retrieving them from cache this time). Both sequences are measured and reported for various entry-sizes and cache-configurations.

This access-pattern is slightly more realistic than in the benchmark described above; a page is likely to load some number of resources from the (same) server, and if the user visits the page second time, those resources are loaded again (from cache this time). The benchmark aims at emulating this pattern in a simple way.

Some results from a Linux server running 64-bit Ubuntu 10.4

Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                       Total time      #runs   avg time/run
Setting datasize 50 bytes
Load from server     (no cache)         2796            101     27.68
Load from server     (mem-cache)        2452            101     24.28
Load from cache      (mem-cache)        210             101     2.08
Load from server     (disk-cache)       2805            101     27.77
Load from cache      (disk-cache)       184             101     1.82
Setting datasize 1024 bytes
Load from server     (no cache)         2893            101     28.64
Load from server     (mem-cache)        2529            101     25.04
Load from cache      (mem-cache)        272             101     2.69
Load from server     (disk-cache)       2936            101     29.07
Load from cache      (disk-cache)       250             101     2.48
Setting datasize 51200 bytes
Load from server     (no cache)         9673            101     95.77
Load from server     (mem-cache)        9617            101     95.22
Load from cache      (mem-cache)        3311            101     32.78
Load from server     (disk-cache)       9588            101     94.93
Load from cache      (disk-cache)       3307            101     32.74
Setting datasize 524288 bytes
Load from server     (no cache)         76889           101     761.28
Load from server     (mem-cache)        76569           101     758.11
Load from cache      (mem-cache)        27940           101     276.63
Load from server     (disk-cache)       80344           101     795.49
Load from cache      (disk-cache)       31521           101     312.09

The interesting numbers are found in the rightmost column of the output.

Results above suggest that loading entries up to a certain size from the server (the xpcshell-handler in this case) is rather consistent regardless of whether we use no cache, mem-cache or a disk-cache. Also, when loading entries up to a certain size from cache, the disk-cache seems to be as fast as the memory-cache. This IMO indicates that IO on this platform is pretty efficient (up to a certain limit) because we would always expect to spend more time when using the disk-cache than when using mem-cache. Observe that we need 512K-sized resources to see a clear difference between using disk and memory. (The exact limit here is not identified, but it may not very interesting either since it is probably platform-specific).

Also observe that when reading from the cache, small entries load relatively much faster than larger entries. (Memory-page size?)

Wanted tests

Unordered and informal list of tests we may also want to create

measure time to search a cache
mem- and disk-only, but also the combination of them I believe. Make sure we search the cache without finding the entry.
identify entry-size where disk-cache is clearly slower than mem-cache
probably OS- or device-specific but it may be worth finding in case we can learn something general from it
driver which can take a list of resources and measure time to load them all
we can extract this info from e.g. Talos or by using telemetry, and it should probably be hierarchical (some resources causes other to load)