User:Jorend/Deterministic hash tables: Difference between revisions

User:Jorend/Deterministic hash tables (view source)

Revision as of 19:12, 27 February 2012

3,307 bytes added , 27 February 2012

→‎Background

Jorend

638

edits

@@ Line 35: / Line 35: @@
 Removing an entry simply replaces its <code>key</code> with some sentinel while leaving the <code>chain</code> intact.
+== Theory ==
+'''Memory.''' Every map data structure must store the keys and values. A map with ''N'' entries must use at least ''NE'' memory, where ''E'' is the size of a key-value pair (we assume the data can’t be compressed).
+In the following discussion we neglect the effect of deletions, which can cause both kinds of table to have extra memory that isn't in use at a given moment.
+An open addressing hash table has additional overhead depending on the fill factor. <code>OpenTable</code> in particular has a maximum fill factor ''F''=3/4. When it becomes more than 3/4 full, it doubles in size, becoming just over 3/8 full. The amount of overhead in an <code>OpenTable</code> is ''(1/f)NE'', where ''f'' is the actual current fill ratio of the table and ''F/2 &lt; f &le; F''.
+A Close table has three sources of overhead: the entire <code>hashTable</code>, the <code>chain</code> pointers in the <code>dataTable</code>, and the unused portion of the <code>dataTable</code>. In <code>CloseTable</code>, the <code>chain</code> pointer causes each entry to require ''(3/2)E'' memory rather than ''E''. The <code>dataTable</code> doubles in size when it becomes completely full, so its size is ''(3/2)(1/u)NE'' where ''u'' is the proportion of the <code>dataTable</code> that is in use and ''1/2 &lt; u &le; 1''. The <code>hashTable</code> is always 1/16 the size of the <code>dataTable</code> (1/8 on 64-bit platforms), so the total size of both tables is ''(1+1/16)(3/2)(1/u)NE'' (substitute 1/8 on 64-bit platforms).
+For back-of-envelope purposes, then, ''f'' may be expected to be about ''3F/4'' (halfway between F/2 and F) and ''u'' about 3/4 on average; plugging these numbers in, we expect an <code>OpenTable</code> to have about (1/(3F/4))&times100%-100% = 78% memory overhead, and a <code>CloseTable</code> about 113% (125% on 64-bit), on average. This would mean that the <code>CloseTable</code> would be 20% larger on average (27% larger on 64-bit). However, see the Results section for a picture of how the two implementations compare in practice.
+'''Speed.''' We consider the speed of lookups only. The <code>get</code>, <code>has</code>, <code>set</code>, and <code>delete</code> operations all start with a lookup. We will quantify the speed of lookups in terms of how many memory locations must be accessed. To simulate locality effects, we treat multiple reads from a single entry, or from the table object, as a single read.
+In an open addressing hash table with fill factor ''f'', an unsuccessful lookup performs about ''1/(1-f)'' probes.[https://en.wikipedia.org/wiki/Hash_table#Performance_analysis] In an <code>OpenTable</code>, at worst ''f=F=3/4'' and so an unsuccessful lookup performs 4 probes on average. In the usual case, ''f=3F/4=9/16'', so about 2.3 probes on average.
+<table>
+  <tr>
+    <th rowspan=2>Data structure</th>
+    <th colspan=2>Expected cost of unsuccessful lookup (miss)</th>
+    <th colspan=2>Expected cost of successful lookup (hit)</th>
+  </tr>
+  <tr>
+    <th>Maximum load</th>
+    <th>Typical load</th>
+    <th>Maximum load</th>
+    <th>Typical load</th>
+  </tr>
+  <tr>
+    <td><code>OpenTable</code>
+    <td>5
+    <td>3.3
+    <td>-
+    <td>-
+  </tr>
+  <tr>
+    <td><code>CloseTable</code>
+    <td>-
+    <td>-
+    <td>-
+    <td>-
+  </tr>
+</table>
 = Method =

User:Jorend/Deterministic hash tables: Difference between revisions

User:Jorend/Deterministic hash tables (view source)

Revision as of 19:12, 27 February 2012

Navigation menu

Search