Necko/MobileCache/MicroBenchmarks: Difference between revisions

no edit summary
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 141: Line 141:
:You need to copy the file "timingchannel-<size>.dat" to "timingchannel.dat" for the size you want to study. Plots detailed information for each resource loaded, one plot for each configuration. The format used is a rowstacked histogram where each column shows total time from creating the channel to loading has finished. '''Note''' that columns show time for each load separately (all times starting at 0) as opposed to a real time-line for the dataset.
:You need to copy the file "timingchannel-<size>.dat" to "timingchannel.dat" for the size you want to study. Plots detailed information for each resource loaded, one plot for each configuration. The format used is a rowstacked histogram where each column shows total time from creating the channel to loading has finished. '''Note''' that columns show time for each load separately (all times starting at 0) as opposed to a real time-line for the dataset.


==== Sample results  ====
==== Sample results  (debug-builds with logging enabled) ====


This benchmark has currently been run successfully on two different Linux-systems and on a Nexus S. Below are extracts from the results, focusing on time to load resources as seen from JavaScript. The plots have been obtained by editing "summary.dat", sorting results by cache-type and splitting up in "hits" and "misses" which are plotted separately. Modifying the generated files and the GnuPlot-files to do this is trivial and I won't discuss it here.
This benchmark has currently been run successfully on two different Linux-systems and on a Nexus S. Below are extracts from the results, focusing on time to load resources as seen from JavaScript. The plots have been obtained by editing "summary.dat", sorting results by cache-type and splitting up in "hits" and "misses" which are plotted separately. Modifying the generated files and the GnuPlot-files to do this is trivial and I won't discuss it here.
Line 167: Line 167:
|}
|}


Note the time to load 2K and 8K entries from a memory-cache filled with small entries... Could this be caused by the access-pattern or is it generally true?
Note the time to load 2K and 8K entries from a memory-cache filled with small entries (sixth group)... Could this be caused by the access-pattern or is it generally true?
 
=== test_timing_cache.js '''(Obsolete)''' ===
This benchmark measures time to load a resource from a server and from the cache, as well as time for call-overhead and clearing the cache.
 
The approach is to repeat a single load in a loop for some time and report the average time for one iteration. This eliminates timer-granularity issues and should smoothen out random jitter in the system, but the weakness is that IO-caching at any level will influence disk-cache results. Moreover, memory-cache results are likely to be influenced by HW caches and memory-paging in the OS. In short, this benchmark does not implement a very realistic access-pattern for cache-entries, hence its practical value may be limited. It can be useful, though, to study changes unrelated to the speed of the cache media, e.g. code to optimize datastructures or algorithms.
 
Note that there are two different tests for cache-misses: One which clears the cache in each iteration, and one where the response has a header which prevents it from being cached. (The first is mainly present for completeness - it makes a clearCache() call in each iteration, which is not very likely to happen in real life.) One could also imagine a third approach to emulate cache-misses; suffix the url with a different query-string in order to load a new resource in each iteration. However, this doesn't work because it hits the max number of open connections and times out. (See test below though, where we can control this limit in a different manner.)
 
All operations are reported for different data-sizes, as well as with no cache enabled, memory-cache enabled only and disk-cache enabled only. (Note that, theoretically, searching for an entry in a cache should take some time, hence we measure also with no cache enabled.)
 
Some results from a Linux server running 64-bit Ubuntu 10.4
 
Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                        Total time      #runs  avg time/run
=== Set datasize 50 bytes
Calloverhead            (no-cache)      10016          14594  0.69
Calloverhead+ClearCache (no-cache)      10015          11647  0.86
CacheMiss (nocache)    (no cache)      10047          368    27.30
Calloverhead            (mem-cache)    10015          14532  0.69
Calloverhead+ClearCache (mem-cache)    10016          11616  0.86
CacheMiss (clrcache)    (mem-cache)    10018          360    27.83
CacheMiss (nocache)    (mem-cache)    10032          367    27.34
CacheHit                (mem-cache)    10017          4674    2.14
Calloverhead            (disk-cache)    10015          13954  0.72
Calloverhead+ClearCache (disk-cache)    10017          6723    1.49
CacheMiss (clrcache)    (disk-cache)    10033          346    29.00
CacheMiss (nocache)    (disk-cache)    10032          357    28.10
CacheHit                (disk-cache)    10017          4709    2.13
=== Set datasize 1024 bytes
CacheMiss (nocache)    (no cache)      10020          339    29.56
CacheMiss (nocache)    (mem-cache)    10037          351    28.60
CacheHit                (mem-cache)    10019          3634    2.76
CacheMiss (nocache)    (disk-cache)    10048          345    29.12
CacheHit                (disk-cache)    10018          3535    2.83
=== Set datasize 51200 bytes
CacheMiss (nocache)    (no cache)      10095          103    98.01
CacheMiss (nocache)    (mem-cache)    10023          104    96.38
CacheHit                (mem-cache)    10070          306    32.91
CacheMiss (nocache)    (disk-cache)    10110          104    97.21
CacheHit                (disk-cache)    10054          301    33.40
=== Set datasize 524288 bytes
CacheMiss (nocache)    (no cache)      10277          14      734.07
CacheMiss (nocache)    (mem-cache)    10312          14      736.57
CacheHit                (mem-cache)    10443          31      336.87
CacheMiss (nocache)    (disk-cache)    10283          14      734.50
CacheHit                (disk-cache)    10451          31      337.13
 
The important numbers are found in the rightmost column. Note that the first block (datasize 50 bytes) includes more results than the other blocks. "Pure Overhead" measures the call-overhead from JavaScript, and "Overhead+ClearCache" measures the time to Clear the cache from JavaScript. These numbers are unrelated to the size of the data loaded, hence reported only once.
 
Results above indicate that reading from the disk-cache is about as fast as reading from memory-cache, which probably is due to efficient IO-caching on this platform. Note also that cache-hits for small entries are relatively much faster than cache-hits for larger entries (i.e. for small entries there is a factor 10 whereas for large entries we see a factor 2-3).
 
There is no conclusive evidence that searching a disk or memory cache takes any significant amount of time. However, the caches in this test are pretty much empty so we should create a separate test for this, making sure the cache contains some number of entries before we measure. (See [[#Wanted_tests]] below)
 
Some raw results from my Nexus S. '''I find it weird that this device outperforms the Linux server when the data-size becomes larger...  any insight is appreciated'''
 
Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                        Total time      #runs  avg time/run
=== Set datasize 50 bytes
Calloverhead            (no-cache)      10016          10878  0.92
Calloverhead+ClearCache (no-cache)      10016          9482    1.06
CacheMiss (nocache)    (no cache)      10020          321    31.21
Calloverhead            (mem-cache)    10015          10975  0.91
Calloverhead+ClearCache (mem-cache)    10016          8937    1.12
CacheMiss (clrcache)    (mem-cache)    10036          371    27.05
CacheMiss (nocache)    (mem-cache)    10021          348    28.80
CacheHit                (mem-cache)    10016          4639    2.16
Calloverhead            (disk-cache)    10016          11426  0.88
Calloverhead+ClearCache (disk-cache)    10043          213    47.15
CacheMiss (clrcache)    (disk-cache)    10034          117    85.76
CacheMiss (nocache)    (disk-cache)    10019          347    28.87
CacheHit                (disk-cache)    10017          4206    2.38
=== Set datasize 1024 bytes
CacheMiss (nocache)    (no cache)      10049          332    30.27
CacheMiss (nocache)    (mem-cache)    10024          339    29.57
CacheHit                (mem-cache)    10017          3501    2.86
CacheMiss (nocache)    (disk-cache)    10031          327    30.68
CacheHit                (disk-cache)    10017          3308    3.03
=== Set datasize 51200 bytes
CacheMiss (nocache)    (no cache)      10064          119    84.57
CacheMiss (nocache)    (mem-cache)    10074          119    84.66
CacheHit                (mem-cache)    10049          396    25.38
CacheMiss (nocache)    (disk-cache)    10078          116    86.88
CacheHit                (disk-cache)    10031          426    23.55
=== Set datasize 524288 bytes
CacheMiss (nocache)    (no cache)      10572          19      556.42
CacheMiss (nocache)    (mem-cache)    10052          18      558.44
CacheHit                (mem-cache)    10274          45      228.31
CacheMiss (nocache)    (disk-cache)    10167          18      564.83
CacheHit                (disk-cache)    10177          43      236.67
 
This is one way to represent the results
 
{| border="1" cellpadding="5" cellspacing="0" align="center"
|+'''Summary of results over different platforms'''
|-
! colspan="2" |
! colspan="4" | No cache
! colspan="4" | Memory-Cache
! colspan="4" | Disk-Cache
|-
! colspan="2" | || 50 || 1024 || 51200 || 524288 || 50 || 1024 || 51200 || 524288 || 50 || 1024 || 51200 || 524288
|-
! rowspan="3" | Ubuntu 64bit server
! | Clear cache
| colspan="4"| 0.86
| colspan="4"| 0.86
| colspan="4"| 1.49
|-
! | Miss
| 27.30 || 29.56 || 98.01 || 734.07
| 27.34 || 28.60 || 96.38 || 736.57
| 28.10 || 29.12 || 97.21 || 734.50
|-
! | Hit
| colspan="4" | '''n/a'''
| 2.14  || 2.76 || 32.91 || 336.87
| 2.13  || 2.83 || 33.40 || 337.13
|-
! rowspan="3" | Nexus S
! | Clear cache
| colspan="4"| 1.06
| colspan="4"| 1.12
| colspan="4"| 47.15
|-
! | Miss
| 31.21 || 30.27 || 84.57 || 556.42
| 28.80 || 29.57 || 84.66 || 558.44
| 28.87 || 30.68 || 86.88 || 564.83
|-
! | Hit
| colspan="4" | '''n/a'''
| 2.16  || 2.86 || 25.38 || 228.31
| 2.38  || 3.03 || 23.55 || 236.67
|-
! rowspan="3" | Samsung Galaxy S (from Geoff)
! | Clear cache
| colspan="4"| 1.89
| colspan="4"| 1.87
| colspan="4"| 1339.75
|-
! | Miss
| 31.33 || 27.48 || 74.04 || 524.20
| 27.27 || 28.32 || 75.05 || 529.32
| 28.04 || 28.43 || 93.92 || 612.59
|-
! | Hit
| colspan="4" | '''n/a'''
| 2.67  || 3.04 || 21.67 || 209.69
| 3.03  || 3.37 || 24.37 || 218.61
|}
 
 
and this is another representation. I prefer the latter since it allows me to easier study the effect of different cache-configurations on a particular testcase
 
{| border="1" cellpadding="5" cellspacing="0" align="center"
|+'''Summary 2 of results over different platforms'''
|-
! colspan="2" | || 50 || 1024 || 51200 || 524288
|-
! rowspan="5" | Ubuntu 64bit server
! align="right" | Miss (no-cache)
| 27.30 || 29.56 || 98.01 || 734.07
|-
| align="right" | (mem-cache)
| 27.34 || 28.60 || 96.38 || 736.57
|-
| align="right" | (disk-cache)
| 28.10 || 29.12 || 97.21 || 734.50
|-
! | Hit (mem-cache)
| 2.14  || 2.76 || 32.91 || 336.87
|-
| align="right" | (disk-cache)
| 2.13  || 2.83 || 33.40 || 337.13
|-
! rowspan="5" | Nexus S
! align="right" | Miss (no-cache)
| 31.21 || 30.27 || 84.57 || 556.42
|-
| align="right" | (mem-cache)
| 28.80 || 29.57 || 84.66 || 558.44
|-
| align="right" | (disk-cache)
| 28.87 || 30.68 || 86.88 || 564.83
|-
! align="right" | Hit (mem-cache)
| 2.16  || 2.86 || 25.38 || 228.31
|-
| align="right" | (disk-cache)
| 2.38  || 3.03 || 23.55 || 236.67
|-
! rowspan="5" | Samsung Galaxy S (from Geoff)
! align="right" | Miss (no-cache)
| 31.33 || 27.48 || 74.04 || 524.20
|-
| align="right" | (mem-cache)
| 27.27 || 28.32 || 75.05 || 529.32
|-
| align="right" | (disk-cache)
| 28.04 || 28.43 || 93.92 || 612.59
|-
! align="right" | Hit (mem-cache)
| 2.67  || 3.04 || 21.67 || 209.69
|-
| align="right" | (disk-cache)
| 3.03  || 3.37 || 24.37 || 218.61
|}
 
=== test_timing_cache_2.js '''(Obsolete)''' ===
This benchmark loads a number of resources with different cache-keys and then waits for all cache-io to finish. Then it loads all resources again (retrieving them from cache this time). Both sequences are measured and reported for various entry-sizes and cache-configurations.
 
This access-pattern is slightly more realistic than in the benchmark described above; a page is likely to load some number of resources from the (same) server, and if the user visits the page second time, those resources are loaded again (from cache this time). The benchmark aims at emulating this pattern in a simple way.
 
Some results from a Linux server running 64-bit Ubuntu 10.4
 
Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                        Total time      #runs  avg time/run
Setting datasize 50 bytes
CacheMiss (nocache) (no cache)          2803            101    27.75
CacheMiss (q-url)  (mem-cache)        2767            101    27.40
CacheHit            (mem-cache)        205            101    2.03
CacheMiss (q-url)  (disk-cache)        2783            101    27.55
CacheHit            (disk-cache)        186            101    1.84
Setting datasize 1024 bytes
CacheMiss (nocache) (no cache)          2884            101    28.55
CacheMiss (q-url)  (mem-cache)        2570            101    25.45
CacheHit            (mem-cache)        269            101    2.66
CacheMiss (q-url)  (disk-cache)        2910            101    28.81
CacheHit            (disk-cache)        271            101    2.68
Setting datasize 51200 bytes
CacheMiss (nocache) (no cache)          9546            101    94.51
CacheMiss (q-url)  (mem-cache)        9536            101    94.42
CacheHit            (mem-cache)        3274            101    32.42
CacheMiss (q-url)  (disk-cache)        9543            101    94.49
CacheHit            (disk-cache)        3274            101    32.42
Setting datasize 524288 bytes
CacheMiss (nocache) (no cache)          75942          101    751.90
CacheMiss (q-url)  (mem-cache)        75840          101    750.89
CacheHit            (mem-cache)        27799          101    275.24
CacheMiss (q-url)  (disk-cache)        79318          101    785.33
CacheHit            (disk-cache)        31333          101    310.23
 
The interesting numbers are found in the rightmost column of the output.
 
Results above suggest that loading entries up to a certain size from the server (the xpcshell-handler in this case) is rather consistent regardless of whether we use no cache, mem-cache or a disk-cache. Also, when loading entries up to a certain size from cache, the disk-cache seems to be as fast as the memory-cache. This IMO indicates that IO on this platform is pretty efficient (up to a certain limit) because we would always expect to spend more time when using the disk-cache than when using mem-cache. Observe that we need 512K-sized resources to see a clear difference between using disk and memory. (The exact limit here is not identified, but it may not very interesting either since it is probably platform-specific).
 
Also observe that when reading from the cache, small entries load relatively much faster than larger entries. (Memory-page size?)
 
Results from my Nexus S. '''Like above, the performance on large data is surprisingly good compared to the Linux server...  any insight?'''
 
Memory-cache capacity: 51200 Kb / Disk-cache capacity: 1048576 Kb
Max #connections=256 / max per server=256
                                        Total time      #runs  avg time/run
Setting datasize 50 bytes
CacheMiss (nocache) (no cache)          2724            101    26.97
CacheMiss (q-url)  (mem-cache)        2249            101    22.27
CacheHit            (mem-cache)        126            101    1.25
CacheMiss (q-url)  (disk-cache)        2314            101    22.91
CacheHit            (disk-cache)        146            101    1.45
Setting datasize 1024 bytes
CacheMiss (nocache) (no cache)          2190            101    21.68
CacheMiss (q-url)  (mem-cache)        2206            101    21.84
CacheHit            (mem-cache)        177            101    1.75
CacheMiss (q-url)  (disk-cache)        2382            101    23.58
CacheHit            (disk-cache)        189            101    1.87
Setting datasize 51200 bytes
CacheMiss (nocache) (no cache)          7388            101    73.15
CacheMiss (q-url)  (mem-cache)        7582            101    75.07
CacheHit            (mem-cache)        2205            101    21.83
CacheMiss (q-url)  (disk-cache)        7749            101    76.72
CacheHit            (disk-cache)        2226            101    22.04
Setting datasize 524288 bytes
CacheMiss (nocache) (no cache)          57251          101    566.84
CacheMiss (q-url)  (mem-cache)        57939          101    573.65
CacheHit            (mem-cache)        22888          101    226.61
CacheMiss (q-url)  (disk-cache)        62850          101    622.28
CacheHit            (disk-cache)        22626          101    224.02


== Wanted tests ==
== Wanted tests ==
97

edits