User:Jorend/Deterministic hash tables: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
= Abstract =
= Abstract =


A deterministic hash table proposed by Tyler Close was implemented in C++ and its performance was compared to two hash table implementations using open addressing. Speed and memory usage were measured. The Close table implementation was very fast, faster than the tables with open addressing. It uses more memory (up to 75% more). This is somewhat mitigated by the fact that, unlike a hash table with open addressing, the Close table can leave some of the memory it allocates uninitialized until it is needed.
A deterministic hash table proposed by Tyler Close was implemented in C++ and its performance was compared to two hash table implementations using open addressing. Speed and memory usage were measured.
 
'''Speed.''' The Close table implementation was very fast, faster than the open addressing implementations. (It is unclear why! More investigation is needed.)
 
'''Memory.''' The Close table allocates 25% more memory than the leanest open addressing implementation, but it does not write to every byte allocated. Rather it writes only to the bytes it needs. This means that for large tables without remove operations, the Close table will use more address space but ''less physical memory'' than a comparable open addressing implementation. (Note however that programs very often create many small tables and fairly often perform deletes, so the 25% penalty will have some practical impact.)


= Background =
= Background =

Revision as of 18:24, 22 February 2012

Abstract

A deterministic hash table proposed by Tyler Close was implemented in C++ and its performance was compared to two hash table implementations using open addressing. Speed and memory usage were measured.

Speed. The Close table implementation was very fast, faster than the open addressing implementations. (It is unclear why! More investigation is needed.)

Memory. The Close table allocates 25% more memory than the leanest open addressing implementation, but it does not write to every byte allocated. Rather it writes only to the bytes it needs. This means that for large tables without remove operations, the Close table will use more address space but less physical memory than a comparable open addressing implementation. (Note however that programs very often create many small tables and fairly often perform deletes, so the 25% penalty will have some practical impact.)

Background

Most hash table APIs (including C++'s unordered_map, Java's java.util.HashMap, Python's dict, Ruby's Hash, and many others) allow users to iterate over all the entries in the hash table in an arbitrary order. This exposes the user to nondeterminism in the implementation.

Map and Set data structures are proposed for a future version of the ECMAScript programming language. The standard committee would like to avoid underspecification if possible. Can a data structure retain the performance of traditional, arbitrary-order hash tables while also storing the order in which entries were added, so that iteration is deterministic?

...


Method

Source code: https://github.com/jorendorff/dht

Results

Jorendorff-dht-figure-1.png

Jorendorff-dht-figure-2.png