Hash table (hash map)
In computing, a hash table, also known as hash map, is a data structure that implements an array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.
Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.
— Wikipedia
Algorithm | Average | Worst case |
---|---|---|
Space | Θ(n) | O(n) |
Access | N/A | N/A |
Search | Θ(1) | O(n) |
Insert | Θ(1) | O(n) |
Delete | Θ(1) | O(n) |
Process of taking characters and converting them to numbers is known as hashing. And the code that is used to convert those letters into particular numbers is called a hash function. It must convert the same string to the same number every single time it’s applied.
Hash table is a list of paired values.
When a message is hashed, the resulting hash value is significantly shorter than the original message. This means that it’s impossible to reconstruct the original message from the hash value alone.
Looking up a value in a hash table has an efficiency of (big O) ====.
hash table.excalidrawA small phone book as a hash table
Which other names of hash table you knowing?
Hash maps, dictionaries and associative arrays.
Trying to add data to a cell that is already filled (when used different keys, but hash is same) is known as a collision.
One classic approach for handling collisions is known as separate chaining. When
a collision occurs, instead of placing a single value in the cell, it places in
it a reference 👉 to an array. Let’s assume initially key was in
with hash
1
, value was nomoz
. We’re trying to add ni
with value zomon
, but ni
hashed also to 1
. How will look value at hash index 1
if we use separate
chaining, how we get correct value?
index | value
------|------
0 | None
1[in] | nomoz
2 | None
3 | None
index | value
---------|------
0 | None
1[in|ni] | [in|nomoz][ni|zomon] # we get correct value by searching by key
2 | None
3 | None
Let’s assume we want to get value by key ni
:
ni
→ hashes to 1- after lookup we detected we need to use search, since separate chaining used
- we search for key in each
subbarray[0]
- we found index at second
subbarray[0]
and value issubbarray[1] = zomon
Hash table’s efficiency depends on three factors:
- How much data we’re storing in the hash table → increase collisions
- How many cells are available in the hash table → increase collisions
- Which hash function we’re using
The number of elements in a hash table divided by the number of slots. Usually written (alpha). The higher the load factor, the slower the retrieval. With open addressing, the load factor cannot exceed 1. With chaining, the load factor often exceeds 1. Typical load factor for hash table is about ====. where is the number of entries occupied in the hash table. is the number of buckets (slots aviable).
When hash table is useful (some real-world scenarios)?
- Attribute-value pairs storing, dictionary, item_name/price pairs, dictionary app, phone book, anything with operations speed.
- Optimizing conditional logic, which can be represented as key/value pairs
- Algorithms optimization, for example two-sum problem can be rewritten with hash tables from to .
Which analogy of hash table you can apply to a book?
Book has TOC (we can say index) and in fact it’s already hash table, we are able
to jump to required topic (search value) by this TOC. Search index in TOC, get
position and jump to page.