Pablo Napoli wrote:Im studying Collections for my ocp exam and I was researching a bit more about some topics so I don't know if my conclusion about HashSet is right.
It supposed to be useful not just for the OCP exam, but also for your career in general understanding how the hash table based data structures work. In particular, I think that could be useful going through technical interviews.
Pablo Napoli wrote:is used to create the hash bucket if previously there were no such value in the set.
First, there is no such thing as "hash bucket". Hash is used to compute the bucket to which object (more correct its reference) supposed to reside.
In Java, HashSet is backed by HashMap, while HashMap itself is backed by an array. So for the purpose of simplicity, I'll say that HashSet values are actually stored within an array.
When the hash code is computed (int value), such value is being further computed using bit-wise AND operator
ANDing hash code's array's length-1 binary representations:
int index = hashCode & arrayLength-1; (or you can think of it as
int index = hashCode % arrayLength; <-- a slower variant of former approach), that way you get an index where element actually gets stored.
In ideal scenario there supposed to be 1 value per array's element position, but that's not always the case. When such thing happen, it is being called a collision occured. In simple words, two unequal objects produce the same hash code value, hence end up in the same bucket. So in such event, there is no longer a singular element (well, technically one element, but I hope you get the point) stored within an array at certain index, but a List of elements.
i.e.: No longer
A[i] = N;
but
A[i] = List of N's;
So in order to retrieve the element (in case of HashSet
contains() operation only, in variants of HashMap
get()) you are looking for, you no longer can find an element in a constant time O(1) (big-O notation), because you need to go through the list of elements at certain index, and using
equals() method to identify, which element actually is the one you are looking for.
Now, till Java 8, when the collisions did happen, Java used LinkedList to store collided elements, so the worst case scenario to retrieve an element was O(n), because you had to iterate presumably over the whole list. Since Java 8, there is some threshold, i.e. when certain amount of collisions do happen, LinkedLists are being replaced with Red black trees (
binary search tree), which significantly reduces the complexity to O(log n). And that is important to understand about the complexities of certain operations and be aware. In most of the real world applications that most likely wouldn't have a serious impact of you assumed O(1), but I trust there are systems when such inaccurate assumption may have a very big and significant impact.