• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Junilu Lacar
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Ganesh Patekar

I'm not sure if mycConclusion about how HashSet works is right

 
Ranch Hand
Posts: 62
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys. Im studying Collections for my ocp exam and I was researching a bit more about some topics so I don't know if my conclusion about HashSet is right. May someone help me?

There are two kinds of hash values. One of the hash values is computed by the hashCode() method that is overriden from the object and is used to create the hash bucket if previously there were no such value in the set. The other hash value is the one computed based on the whole object and stored along with the object in an entry of the hash bucket.

So when searching for an element, the implementation of HashSet selects the appropriate hash bucket (based on the hash value from the hash method) and goes through its entries, comparing the hash value (from the object, not the one computed by hashCode()) stored in each entry with the hash value of the desired element.

I specially based on this in this link: https://www.geeksforgeeks.org/importance-hashcode-method-java/

Thanks!

 
Saloon Keeper
Posts: 6243
58
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pablo Napoli wrote:... the hash values is computed by the hashCode() method that is overriden from the object and is used to create the hash bucket if previously there were no such value in the set.

True.

There is no 'other'. The hashCode() method may choose to arrive at the hash value by considering all of its fields -- or not. From a HashSet perspective it doesn't matter, the designer determines what algorithm is used to compute the hash code.
 
Pablo Napoli
Ranch Hand
Posts: 62
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Carey, thanks for your reply. This time my email didn't let me know about it. Excuse me but when you said "There ir no 'other'" what did you mean?. I mean, did you want to say that there are no two hash code values? Because tha's my biggest doubt.
 
Carey Brown
Saloon Keeper
Posts: 6243
58
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's only one hash code per object instance, the one returned from the hashCode() method. The hash code is not guaranteed to be unique though. The hash code is not typically stored within the object instance but there's nothing to prevent a developer from caching it in a field though it is questionable what that would achieve.

Now, what a HashSet or HashMap does with the hash code it is given is a different matter. These collections start out small and expand as needed so at any time the number of slots is less than the number of possible ints. So these collections apply some algorithm to reduce the int hash code that it is given to some range of ints that match the current allocated size of the hash Collection. This is transparent to the developer writing a Java application. The hash Collection may also choose to store a copy of the hash code as returned from the object's hashCode() method, if it does, then this is also transparent to the application developer.
 
Sheriff
Posts: 24654
58
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pablo Napoli wrote:There are two kinds of hash values. One of the hash values is computed by the hashCode() method that is overriden from the object and is used to create the hash bucket if previously there were no such value in the set. The other hash value is the one computed based on the whole object and stored along with the object in an entry of the hash bucket.



No. The hash value for an object is computed from the overridden hashCode() method, that is correct. But like Carey said, there is no "other hash value". The (one and only) hash value which is computed by the hashCode() method is stored in the hashSet.

You might suspect that if the hashCode() method subsequently returns a different value for that object, then problems could arise. This is absolutely correct. One of the requirements for the hashCode() method is that it should always return the same value for a given object. There should only be one hash value.
 
Marshal
Posts: 14053
234
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I encourage you to study the contracts for both equals() and hashCode()
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#equals-java.lang.Object-
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--

Take particular note of this: it is NOT required for two objects that are not equals() to have distinct values of hashCode(). That means that two objects can be in the same hash bucket but still be different from each other according to equals(). This is called a collision. Ideally, you'd have enough variety of hashCodes and buckets to avoid or minimize collisions but there is always a chance that one will happen so there has to be some kind of collision resolution strategy. A typical approach is to use the hashCode to quickly find the bucket then iterate through a list maintained for that bucket to find the exact object based on equals().
 
Pablo Napoli
Ranch Hand
Posts: 62
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks to everyone for your replies. This mornig I've started reading these posts and then I gave with a youtube video that was amazing explained this topic and it enforces everything you tell here.

The link is: https://www.youtube.com/watch?v=c3RVW3KGIIE
 
Pablo Napoli
Ranch Hand
Posts: 62
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd just like to add an answer to a question that I had before. So why does HashSet have no get() method: The answer is related to the hash code contract because we can have two different objects with the same hash (like two persons with different id but same hash code). In case of HashMap we have get method because the hash is not the object itself (or some of its properties) but a key that must be unique.

Thanks again and it's my grain of sand for others.
 
Marshal
Posts: 7181
492
Mac OS X VI Editor BSD Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pablo Napoli wrote:Im studying Collections for my ocp exam and I was researching a bit more about some topics so I don't know if my conclusion about HashSet is right.


It supposed to be useful not just for the OCP exam, but also for your career in general understanding how the hash table based data structures work. In particular, I think that could be useful going through technical interviews.

Pablo Napoli wrote:is used to create the hash bucket if previously there were no such value in the set.


First, there is no such thing as "hash bucket". Hash is used to compute the bucket to which object (more correct its reference) supposed to reside.

In Java, HashSet is backed by HashMap, while HashMap itself is backed by an array. So for the purpose of simplicity, I'll say that HashSet values are actually stored within an array.

When the hash code is computed (int value), such value is being further computed using bit-wise AND operator ANDing hash code's array's length-1 binary representations: int index = hashCode & arrayLength-1; (or you can think of it as int index = hashCode % arrayLength; <-- a slower variant of former approach), that way you get an index where element actually gets stored.

In ideal scenario there supposed to be 1 value per array's element position, but that's not always the case. When such thing happen, it is being called a collision occured. In simple words, two unequal objects produce the same hash code value, hence end up in the same bucket. So in such event, there is no longer a singular element (well, technically one element, but I hope you get the point) stored within an array at certain index, but a List of elements.

i.e.: No longer
A[i] = N;
but
A[i] = List of N's;

So in order to retrieve the element (in case of HashSet contains() operation only, in variants of HashMap get()) you are looking for, you no longer can find an element in a constant time O(1) (big-O notation), because you need to go through the list of elements at certain index, and using equals() method to identify, which element actually is the one you are looking for.

Now, till Java 8, when the collisions did happen, Java used LinkedList to store collided elements, so the worst case scenario to retrieve an element was O(n), because you had to iterate presumably over the whole list. Since Java 8, there is some threshold, i.e. when certain amount of collisions do happen, LinkedLists are being replaced with Red black trees (binary search tree), which significantly reduces the complexity to O(log n). And that is important to understand about the complexities of certain operations and be aware. In most of the real world applications that most likely wouldn't have a serious impact of you assumed O(1), but I trust there are systems when such inaccurate assumption may have a very big and significant impact.
 
Pablo Napoli
Ranch Hand
Posts: 62
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Vilda, what you say at the end is also specially useful for the exam because that informaton is not in the book and they could ask about what's new in java 8 with hash structures. And now that I understand how it works it feels good

Maybe this is a silly question (I know this is even from oca topics) but I don't understand how does it possible that HashSet does not implement many methods from Set interface. I mean, at first sight, it's like it shouldn't compile.

These are HashSet's methods:

boolean add(E e)
void clear()
Object clone()
boolean contains(Object o)
boolean isEmpty()
Iterator<E> iterator()
boolean remove(Object o)
int size()

And all of this are Set's methods that are not implemente by HashSet:

boolean addAll(Collection<? extends E> c)  
boolean containsAll(Collection<?> c)
boolean equals(Object o)
int hashCode()
boolean removeAll(Collection<?> c)
boolean retainAll(Collection<?> c)
Object[] toArray()
<T> T[] toArray(T[] a)
 
Carey Brown
Saloon Keeper
Posts: 6243
58
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
 
Marshal
Posts: 65814
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pablo Napoli wrote:. . . a youtube video . . .

That is quite a good video, but two things could be improved.
  • 1: It starts by assuming the viewer already knows about Maps
  • 2: It doesn't explain the difference between h % c and h & c − 1
  •  
    Campbell Ritchie
    Marshal
    Posts: 65814
    250
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Please see this old post for the advantages of & over %. It never produces a negative result, and it is faster (as mentioned on the video). It does however only work well on an array whose size is an exact power of two.
     
    She said she got a brazillian. I think owning people is wrong. That is how I learned ... tiny ad:
    Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!