The HashMap class does the same hashing algorithm (for the keys) as the HashSet class. None of the List based class does any hashing. The HashMap class already does the same duplicate detection and removal as the HashSet class. None of the List based class does any duplicate detection and removal.
Basically, a HashSet is simply a HashMap where the value is not used. Why not simply use a HashMap instance where the value is not used?
While inserting elements, using the contains() method, we can check for duplicate elements in list. Put() operation in hashmap, involves several operations and seems to be time consuming.
. . .
The contains() method for a List runs in linear time; the containsKey() method of a hash map runs in amortised constant time if the objects of the “K” have a well implemented hashCode() method. So you will get much faster performance with a hash map for a large set.
The put() method of HashMap requires some time for rehashing and inserting a Map.Entry instance into an array, but rehashing can be done in 4‑6 clock cycles; the add() method of an ArrayList can insert an element into an array probably in one clock cycle; both will from time to time need to enlarge the backing array. So even if the adding operation is faster for the array list than for the hash map, the time to search a List to find whether it already contains an element would overwhelm that and a List&‑based implementation would have much slower performance except in the one case where performance doesn't matter: very small collections. Using hash codes to find elements will give performance just as fast for a 1,000,000‑element collection as for a 10‑element collection, as long as the hash codes are calculated correctly.
As Henry has told you, a hash map can do all the operations you need from a set; you aren't interested in the “V”, so you can use the same object for the “V” in every pair.
posted 1 year ago
"but rehashing can be done in 4‑6 clock cycles.."
I understand rehashing is the process of applying the hash function to move the entries to another hashset.
Mary Rani wrote:
What do we mean by "4-6 clock cycles"?
A CPU uses a clock signal to drive it. For example, a 3 Gigahertz CPU, has about 3 billion cycles per second. Each cycle is one up to 5 volts and back down of that clock. Considering that most processors can do at least one instruction per cycle these days, that works out to 3 billion instructions per core per second.
Anyway, to answer your question, 6 cycles on a 3 Gigahertz CPU works out to about (1 / 500,000,000) of a second...
The rehashing requires two or three shift operations and the same number of bitwise exclusive or operations, each of which can probably be reduced to a single clock cycle. That means a bitwise shift/exclusive or can be done as one instruction by the chip.
I yam what I yam and that's all that I yam - the great philosopher Popeye. Tiny ad:
Try Free Java/.NET Libraries for Word Excel PowerPoint and PDF