For those who has no patience to look through this post, I conclude it in a word: HashSet is backed by HashMap.
Set is a collection in which every element is unique. So does the keys in HashMap. It is reasonable to infer that the implementation of HashSet is based on HashMap, so it is. Remember resize of HashMap? it makes no guarantees as to the iteration order of the set.
With the help of HashMap, HashSet offers constant time performance for the basic operations, like add
, remove
, contains
and size
. Iterating over this set requires time proportional to the sum of the HashSet instance’s size (the number of elements) plus the “capacity” of the backing HashMap instance (the number of buckets).
HashMap Underneath
HashSet involves a map
variable referring to a HashMap, in which the elements are stored. All operations on HashSet will read or write this map
accordingly.
// HashSet.java
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
private transient HashMap<E,Object> map;
public HashSet() {
map = new HashMap<>();
}
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
}
add
add
adds the specified element to this set if not already present.
// HashSet.java
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
For PRESENT
, it is a dummy value to associate with an Object
in the backing Map.
The details of put
method in HashMap lists here: put
.
remove
remove
removes the specified element from this set if present.
// HashSet.java
public boolean remove(Object o) {
return map.remove(o)==PRESENT;
}
contains
contains
inquire whether or not the given element exist in this set.
// HashSet.java
public boolean contains(Object o) {
return map.containsKey(o);
}
Just like get
in HashMap, containsKey
also call getNode
for further. See getNode
.
size
If you want to get the size of HashSet, you actually query how many keys in HashMap.
// HashSet.java
public int size() {
return map.size();
}
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set.