Changes

Jump to: navigation, search

User:Jorend/Deterministic hash tables

3,307 bytes added, 19:12, 27 February 2012
Background
Removing an entry simply replaces its <code>key</code> with some sentinel while leaving the <code>chain</code> intact.
 
== Theory ==
 
'''Memory.''' Every map data structure must store the keys and values. A map with ''N'' entries must use at least ''NE'' memory, where ''E'' is the size of a key-value pair (we assume the data can’t be compressed).
 
In the following discussion we neglect the effect of deletions, which can cause both kinds of table to have extra memory that isn't in use at a given moment.
 
An open addressing hash table has additional overhead depending on the fill factor. <code>OpenTable</code> in particular has a maximum fill factor ''F''=3/4. When it becomes more than 3/4 full, it doubles in size, becoming just over 3/8 full. The amount of overhead in an <code>OpenTable</code> is ''(1/f)NE'', where ''f'' is the actual current fill ratio of the table and ''F/2 &lt; f &le; F''.
 
A Close table has three sources of overhead: the entire <code>hashTable</code>, the <code>chain</code> pointers in the <code>dataTable</code>, and the unused portion of the <code>dataTable</code>. In <code>CloseTable</code>, the <code>chain</code> pointer causes each entry to require ''(3/2)E'' memory rather than ''E''. The <code>dataTable</code> doubles in size when it becomes completely full, so its size is ''(3/2)(1/u)NE'' where ''u'' is the proportion of the <code>dataTable</code> that is in use and ''1/2 &lt; u &le; 1''. The <code>hashTable</code> is always 1/16 the size of the <code>dataTable</code> (1/8 on 64-bit platforms), so the total size of both tables is ''(1+1/16)(3/2)(1/u)NE'' (substitute 1/8 on 64-bit platforms).
 
For back-of-envelope purposes, then, ''f'' may be expected to be about ''3F/4'' (halfway between F/2 and F) and ''u'' about 3/4 on average; plugging these numbers in, we expect an <code>OpenTable</code> to have about (1/(3F/4))&times100%-100% = 78% memory overhead, and a <code>CloseTable</code> about 113% (125% on 64-bit), on average. This would mean that the <code>CloseTable</code> would be 20% larger on average (27% larger on 64-bit). However, see the Results section for a picture of how the two implementations compare in practice.
 
'''Speed.''' We consider the speed of lookups only. The <code>get</code>, <code>has</code>, <code>set</code>, and <code>delete</code> operations all start with a lookup. We will quantify the speed of lookups in terms of how many memory locations must be accessed. To simulate locality effects, we treat multiple reads from a single entry, or from the table object, as a single read.
 
In an open addressing hash table with fill factor ''f'', an unsuccessful lookup performs about ''1/(1-f)'' probes.[https://en.wikipedia.org/wiki/Hash_table#Performance_analysis] In an <code>OpenTable</code>, at worst ''f=F=3/4'' and so an unsuccessful lookup performs 4 probes on average. In the usual case, ''f=3F/4=9/16'', so about 2.3 probes on average.
 
<table>
<tr>
<th rowspan=2>Data structure</th>
<th colspan=2>Expected cost of unsuccessful lookup (miss)</th>
<th colspan=2>Expected cost of successful lookup (hit)</th>
</tr>
<tr>
<th>Maximum load</th>
<th>Typical load</th>
<th>Maximum load</th>
<th>Typical load</th>
</tr>
<tr>
<td><code>OpenTable</code>
<td>5
<td>3.3
<td>-
<td>-
</tr>
<tr>
<td><code>CloseTable</code>
<td>-
<td>-
<td>-
<td>-
</tr>
</table>
= Method =
638
edits

Navigation menu