It’s necessary to learn with questions. Otherwise, you may distract your focus when something you don’t know appears, whether should I go deeper or not. With preliminary doubts in your mind, this would not be a problem. Learning can be direct, clear and fruitful, which brings you more sense of achievement.
For this post about the source code of HashMap
, few questions need answers:
- data structure underneath
- why
HashMap
permits nulls and unsynchronized - does it make guarantees as to the order of the map
- what happens when I put and get a key with value, and time complexity
- how the map shrink or expand when size changes
A class diagrams is helpful and lists firstly.
Storage Structure
As the following code shows, key-values are stored in an array, which can be random access with index.
// HashMap.java
public class HashMap<K,V> extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable {
transient Node<K,V>[] table;
transient int size;
final float loadFactor;
int threshold;
}
Other member variables mean:
size
: the number of key-value mappings contained in this map.loadFactor
: the load factor for the hash table.threshold
: the next size value at which to resize (same ascapacity * load factor
).
The Node
is a static and nested class in HashMap
. It’s basic hash bin node, used for most entries. For briefness, I assume Node
is used for all entries, but remember TreeNode
is the other option.
// HashMap.java
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
key
, hash
,value
is easy to understand, and the next
? It is used for handling key conflict.
Let’s see its equals
method with compare order:
- address
- instance identity
equals
fromObjects
// HashMap.java
static class Node<K,V> implements Map.Entry<K,V> {
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
put
put
associates the specified value with the specified key in this map. The old value is replaced if the key is already contained in the map.
// HashMap.java
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Pseudocode that summarizes steps of put
lists here:
- resize the map if table is null or size is 0
- new a node if the key does not exist, and insert it into table
- if the hash exists, append to the end of list
- resize if threshold is exceeded
resize
As we can see from putting a new key-value into HashMap
, it is possible to trigger resize
, which would double table size. The elements from each bin must either stay at same index, or move with a power of two offset in the new table.
// HashMap.java
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
Two points should be highlighted:
- The elements from each bin stay at same index if only
(e.hash & oldCap) == 0
- The order in new table is same as the one in old table.
get
get
returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
// HashMap.java
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
The iteration is comprehensive,
- locate the index of table based on hash value
- check if first node is eligible
- traverse the list and check one by one
Conclusion
Now, we can answer the questions mentioned at the beginning of this post.
data structure underneath
Beneath the HashMap
, an array with its elements Node
is used for storage, and it is friendly to random access.
why HashMap
permits nulls and unsynchronized
When putting a new key-value, newNode
is called, which does not check nulls.
After scanning the source code of HashMap
, there is no code to deal with synchronization. So it is unsynchronized.
does it make guarantees as to the order of the map
No, resize of the table may distort the order.
what happens when I put and get a key with value, and time complexity
put
may lead to resize of the map, so it’s time complexity is O(n). While the element can be located with index, so get
costs O(1) for the optimal situation. If key conflicts, it needs to iterate the whole list sharing the same key.
how the map shrink or expand when size changes
The question has already been answered. expand? yes! and shrink? no!