User:Jorend/Deterministic hash tables: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 46: Line 46:
[[Image:jorendorff-dht-figure-1.png]]
[[Image:jorendorff-dht-figure-1.png]]


CloseTable is more memory-efficient than DenseTable, because the latter is really wasteful of memory.
CloseTable is much more memory-efficient than DenseTable, because the latter is tuned for fast lookups at all costs.


[[Image:jorendorff-dht-figure-2.png]]
[[Image:jorendorff-dht-figure-2.png]]
Line 55: Line 55:


This measures the time required to fill a table to about 100 entries, repeatedly.
This measures the time required to fill a table to about 100 entries, repeatedly.
This graph and the following ones show speed, so '''higher is better'''.
Ideally all these graphs would show three straight horizontal lines, indicating that all three implementations scale beautifully to large work loads. And indeed this seems to be the case, mostly.


[[Image:jorendorff-dht-InsertLargeTest-speed.png]]
[[Image:jorendorff-dht-InsertLargeTest-speed.png]]

Revision as of 01:44, 23 February 2012

Abstract

A deterministic hash table proposed by Tyler Close was implemented in C++ and its performance was compared to two hash table implementations using open addressing. Speed and memory usage were measured.

Speed. The Close table implementation was very fast, faster than the open addressing implementations. It is unclear why; theory suggests it "should" be slower, and measurement confirms that the Close table is doing more memory accesses and more branches. More investigation is needed.

Memory. The Close table implementation allocates 29% more memory on average than the leanest open addressing implementation on 32-bit systems, 37% more on 64-bit systems. The implementation can probably trade speed for compactness, but again, more investigation is needed.

Background

Most hash table APIs (including C++'s unordered_map, Java's java.util.HashMap, Python's dict, Ruby's Hash, and many others) allow users to iterate over all the entries in the hash table in an arbitrary order. This exposes the user to nondeterminism in the implementation.

Map and Set data structures are proposed for a future version of the ECMAScript programming language. The standard committee would like to avoid underspecification if possible. Can a data structure retain the performance of traditional, arbitrary-order hash tables while also storing the order in which entries were added, so that iteration is deterministic?

...


Method

The code I used to make these pictures is available at: https://github.com/jorendorff/dht

The project contains two complete hash map implementations: OpenTable and CloseTable. A third implementation, DenseTable, is a thin wrapper around the dense_hash_map type from Sparsehash. The three classes have the same API and were all benchmarked using the same templates (in hashbench.cpp).

Hash table implementation design notes:

  • The Key and Value types are uint64_t because ECMAScript values are 64 bits in the implementation I'm most familiar with.
  • OpenTable and CloseTable are meant to be as fast and as memory-efficient as possible. Pretty much everything that can be omitted was omitted. For example, the hashing function is trivial: hash(key) = key, and neither OpenTable nor CloseTable further munges the hashcode before using it as a table index. Rationale: Making each implementation as fast as possible should highlight any performance difference between OpenTable and CloseTable, which is the purpose of the exercise. Using a more sophisticated hashing function would slow down both implementations, reducing the observed difference between the two techniques.
  • DenseTable is provided as a baseline. (It's nice to have some realistic numbers in the graphs too.)
  • dense_hash_map and OpenTable both implement straightforward hash tables with open addressing. The main difference between the two is one of tuning. dense_hash_map has a maximum load factor of 0.5. OpenTable has a maximum load factor of 0.75, which causes it to use about half as much memory most of the time.
  • The purpose of the typedefs KeyArg and ValueArg is to make it possible to switch the API from pass-by-value to pass-by-reference by editing just a couple of lines of code. (I tried this. Pass-by-reference is no faster on 64-bit machines.)

Benchmark design notes:

  • There would be more tests but it takes a long time to write them. Feel free to send me pull requests.
  • The program runs each benchmark many different times in order to produce enough numbers that noise is visually obvious in the resulting graph. (Most of the speed graphs are nice and smooth.)

Results

Jorendorff-dht-figure-1.png

CloseTable is much more memory-efficient than DenseTable, because the latter is tuned for fast lookups at all costs.

Jorendorff-dht-figure-2.png

DenseTable and OpenTable must initialize the entire table each time they resize. CloseTable also allocates large chunks of memory, but like a vector, it does not need to write to that memory until there is data that needs to be stored there.

Jorendorff-dht-InsertSmallTest-speed.png

This measures the time required to fill a table to about 100 entries, repeatedly.

This graph and the following ones show speed, so higher is better.

Ideally all these graphs would show three straight horizontal lines, indicating that all three implementations scale beautifully to large work loads. And indeed this seems to be the case, mostly.

Jorendorff-dht-InsertLargeTest-speed.png

This test measures the time required to fill a single gigantic table.

The jagged shape of this graph is robust. It reflects the fact that the table doubles in size as it grows, and rehashing is a significant part of the expense of populating a huge table.

Jorendorff-dht-LookupHitTest-speed.png

Jorendorff-dht-LookupMissTest-speed.png

In an OpenTable, lookups that miss are slower than lookups that hit when there is at least one collision. This is because we keep probing the hash table until we find an empty entry.

DenseTable is only slightly slower for misses, perhaps because more lookups have zero collisions. (?)