Win a copy of Succeeding with AI this week in the Artificial Intelligence and Machine Learning forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
  • Junilu Lacar
Sheriffs:
  • Tim Cooke
  • Jeanne Boyarsky
  • Knute Snortum
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:
  • salvin francis
  • fred rosenberger
  • Frits Walraven

HashSet internal implementation

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
why does hashset internally use hashmap? Is it not a design overhead? what are the possible advantages of using hashmap instead of any other collection, like a list?
 
Marshal
Posts: 69065
275
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

Please explain how you think a List would give better performance than a Map behind the Set implementation.
 
Mary Rani
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you!

While inserting elements, using the contains() method, we can check for duplicate elements in list. Put() operation in hashmap, involves several operations and seems to be time consuming.

I might be missing some important aspects here. It would be helpful if the right pointers can be given to understand why hashmap is the best fit.
 
author
Posts: 23874
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

The HashMap class does the same hashing algorithm (for the keys) as the HashSet class. None of the List based class does any hashing. The HashMap class already does the same duplicate detection and removal as the HashSet class. None of the List based class does any duplicate detection and removal.

Basically, a HashSet is simply a HashMap where the value is not used. Why not simply use a HashMap instance where the value is not used?

Henry
 
Campbell Ritchie
Marshal
Posts: 69065
275
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Mary Rani wrote:Thank you!

That's a pleasure

While inserting elements, using the contains() method, we can check for duplicate elements in list. Put() operation in hashmap, involves several operations and seems to be time consuming.
. . .

The contains() method for a List runs in linear time; the containsKey() method of a hash map runs in amortised constant time if the objects of the “K” have a well implemented hashCode() method. So you will get much faster performance with a hash map for a large set.
The put() method of HashMap requires some time for rehashing and inserting a Map.Entry instance into an array, but rehashing can be done in 4‑6 clock cycles; the add() method of an ArrayList can insert an element into an array probably in one clock cycle; both will from time to time need to enlarge the backing array. So even if the adding operation is faster for the array list than for the hash map, the time to search a List to find whether it already contains an element would overwhelm that and a List&‑based implementation would have much slower performance except in the one case where performance doesn't matter: very small collections. Using hash codes to find elements will give performance just as fast for a 1,000,000‑element collection as for a 10‑element collection, as long as the hash codes are calculated correctly.
As Henry has told you, a hash map can do all the operations you need from a set; you aren't interested in the “V”, so you can use the same object for the “V” in every pair.
 
Mary Rani
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

"but rehashing can be done in 4‑6 clock cycles.."


I understand rehashing is the process of applying the hash function to move the entries to another hashset.

What do we mean by "4-6 clock cycles"?
 
Henry Wong
author
Posts: 23874
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Mary Rani wrote:
What do we mean by "4-6 clock cycles"?



A CPU uses a clock signal to drive it. For example, a 3 Gigahertz CPU, has about 3 billion cycles per second. Each cycle is one up to 5 volts and back down of that clock. Considering that most processors can do at least one instruction per cycle these days, that works out to 3 billion instructions per core per second.

Anyway, to answer your question, 6 cycles on a 3 Gigahertz CPU works out to about (1 / 500,000,000) of a second...

Henry
 
Campbell Ritchie
Marshal
Posts: 69065
275
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The rehashing requires two or three shift operations and the same number of bitwise exclusive or operations, each of which can probably be reduced to a single clock cycle. That means a bitwise shift/exclusive or can be done as one instruction by the chip.
 
I yam what I yam and that's all that I yam - the great philosopher Popeye. Tiny ad:
Try Free Java/.NET Libraries for Word Excel PowerPoint and PDF
htttp://www.e-iceblue.com/free-apis.html
    Bookmark Topic Watch Topic
  • New Topic