• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Rob Spoor
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Junilu Lacar
  • Tim Cooke
Saloon Keepers:
  • Tim Holloway
  • Piet Souris
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Frits Walraven
  • Himai Minh

Better code for interview question list duplicate characters in a String

 
Master Rancher
Posts: 4002
52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:So it depends on the quality of the hashCode() implementation.



Well, it depends on the hashCode() quality, and on the nature of the data.

Bob Sherry wrote:I do not understand how HashMap can have an O(1) (worst case) insertion time even with an Integer key because the insertion routine is going to do a mod operation on that hash value and that could result in a collision.



Bob, I suppose it's possible that for worst case, it could be O(N).  I think that's extremely unlikely though, probably impossible.  You'd have to work hard to create input data that could possibly exhibit that behavior.  If you're dealing with deliberate, malicious input, I suppose it's possible.  Though I think you'd be limited by a shortage of distinct keys evaluating to the same hash bucket.

The average performance is certainly O(1) though.
 
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I do not know what the worse case running time of the insertion into a HashMap is. It depends on the algorithm used. It should be a function of how much of the HashMap is already used. The problem is collisions.
 
Marshal
Posts: 26750
81
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For me it's like this: Many people in the past have said that adding an entry to a Map is an O(1) operation. A lot of those people know what they are talking about, much more so than I do. So I'm inclined to accept what they say and not to accept "But collisions" as a reason to doubt it.

On the other hand if somebody demonstrates how collisions in the addition process make the operation something other than O(1), I'm also inclined to accept that. However since those people who know what they are talking about also knew that collisions were involved, I'm not expecting to see that demonstration.
 
Marshal
Posts: 73751
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:. . . The average performance is certainly O(1) though.

Is that what they call amortised constant time?
 
Mike Simmons
Master Rancher
Posts: 4002
52
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In this case, yes.  "Average" is my simpler, sloppier way to refer to it.

Also of interest, with a HashMap the biggest performance hit from any one insert operation will probably not come from bucket collisions, but from occasionally needing to resize the whole HashMap when it exceeds its threshold size.  Thatis an O(N) operation for a single insert - but it only happens O(1/N) of the time, so it balances out.  
 
Campbell Ritchie
Marshal
Posts: 73751
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:. . . it balances out.  

. . . and we are back with “average”. Remember the “N” in MS' formulae refers to the size of the map rather than the number of pairs being “put” into it. You can avoid such rearrangements if you predict the requisite size in advance:- ...new HashMap<>(56) to accommoadte 56 pairs will give you a 128‑bucket map because 56 is greater than 75% (default load factor) of 64.
 
Saloon Keeper
Posts: 13197
286
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Is that what they call amortised constant time?


No. Time complexity is given for different cases, usually worst case, average case and, rarely, best case. For this particular algorithm, the average case runs in amortized constant time, but an algorithm might also run in amortized constant time in the best case or in the worst case. The average case might also run in a different time complexity class.

For a hash map, the best case is when the map's loadingFactor * capacity is greater than it's current size n and the element type's hash function is optimal. In this case, the time complexity class is O(1).

The average case is when the map doesn't start with a large initial capacity, but the element type's hash function is decent. Collisions will occur, but they will occur less and less often as the map increases in size. The time complexity class is amortized O(1). This is the same as saying that adding m more elements to the map will run in O(m) time. Note that most of the overhead that is amortized does not come from collisions, but rather from increasing the map's capacity when it's full.

The worst case is when every element has the same hash code. In this case, all elements end up in the same bucket. Buckets are implemented by red-black trees so the time complexity class is O(log n), where n is the number of elements already in the map.
 
Stephan van Hulst
Saloon Keeper
Posts: 13197
286
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Bob Sherry wrote:I do not know what the worse case running time of the insertion into a HashMap is. It depends on the algorithm used. It should be a function of how much of the HashMap is already used. The problem is collisions.


Worst case is O(log n), because these days HashMap uses a tree to store colliding elements.

Collisions are only a problem if the element type's hashCode() implementation is poor. This is definitely not the case for Unicode characters, because there is a one-to-one mapping between Unicode characters and their hash code. Even so, collisions will occur, but they will occur less and less often as the capacity of the map increases.
 
Campbell Ritchie
Marshal
Posts: 73751
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I shall have to look at the code and see how those trees work. Can't do it now: busy.

Stephan van Hulst wrote:. . . these days HashMap uses a tree to store colliding elements. . . .

Thank you. As you say, collisions are only really a problem if they are frequent because of a poor hash code algorithm. In the “good old days” the buckets incorporated linked lists, so the worst case complexity, as Joshua Bloch points out in Effective Java (at least editions 1 and 2) was linear (O(n)).
 
Campbell Ritchie
Marshal
Posts: 73751
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Still haven't looked at the source. If I were designing the old style Map using a linked list rather than a tree, I would add new elements to the head of the list, so there would be no performance difference caused by collisions. Any performance problems would only occur when searching the Map.
 
Stephan van Hulst
Saloon Keeper
Posts: 13197
286
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I took a look. HashMap uses a list normally, but switches to a tree when many collisions occur. That means that it's still appropriate to say it has a worst case time complexity of O(log n).

Campbell Ritchie wrote:I would add new elements to the head of the list, so there would be no performance difference caused by collisions. Any performance problems would only occur when searching the Map.


I'm not sure what you mean. It doesn't matter whether you add new elements to the head or the tail, because LinkedList is doubly linked. Even if it wasn't, it would make no difference because an add operation first has to search through the list to make sure the element to add isn't already contained.
 
Campbell Ritchie
Marshal
Posts: 73751
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I thought it was a singly‑linked list comprising the entry pairs, which each have a link to the next entry pair. Maybe I was mistaken. Did you notice how the newer implementations decide which “K”s go left and which right?
Damn! I had forgotten you have to iterate the buckets to verify the “K” is absent before putting it. Yes, that would be linear time if the hash function is really bad. Of course we shouldn't put the blame on the map; it is the bad hashCode() method that deserves the blame.
 
Stephan van Hulst
Saloon Keeper
Posts: 13197
286
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I took a deeper look into the code. HashMap indeed uses a singly linked list for buckets up to 8 elements, after which it will treeify the bucket.

As I already mentioned, the tree is a red-black tree, but it's a custom implementation and doesn't reuse TreeMap (likely because keys are not necessarily Comparable). Entries are ordered by hash code first and if the keys are Comparable, by key second.

Interestingly, this means that if the element type's hash function is bad, an O(log n) time complexity can only be maintained if the keys are Comparable. Otherwise, performance degrades to linear time.
 
They worship nothing. They say it's because nothing lasts forever. Like this tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic