This week's book giveaway is in the Python forum.
We're giving away four copies of High Performance Python for Data Analytics and have Tiago Rodrigues Antao on-line!
See this thread for details.
Win a copy of High Performance Python for Data Analytics this week in the Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

hashCode() method 'initial' value

 
Ranch Hand
Posts: 55
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I've recently been looking more into the inner workings of Java's HashMap which led me to revisit the hashCode() and equals() methods.

Now, specifically concerning the hashCode() method—I've see various examples that use an 'initial' value which should be a non-zero (odd prime) number.  This is not the multiplier which is also a prime number.  The same 'initial value' is present if I let my IDE create the code in the method for me.  Take the following example if you will...



I've been trying to figure out in what situation(s) this initial value might be beneficial in reducing bucket collisions but I can't seem to find any actual examples anywhere.

Is it possible someone could help me out here?  Why this initial value?  Could someone possibly show a simple code example where there would be more collisions if this value were either...

  • Omitted altogether / set to 0
  • Set to any value other than an odd prime


  • I've been running my own tests and as yet they're inconclusive.

    The only situation I can think of is when sub-classing.  If the equals() method in the super-class were to use the getClass() method to compare instances, then obviously an instance of the sub-class could not be 'equal' to an instance of the super-class. However they could still return the same hash code... so if we then if overrode the hashCode() method in the sub-class we could use a different initial value to get a different hash code, potentially avoiding a collision.  I realise that the 2 objects returning the same hash code does not break the hashcode contract but if we know the objects aren't of the same type, then we may as well take action.

    Am I on the right track or are there other (non sub-classing) situations where this initial value is actually beneficial??

    Thanks all, your thoughts are really appreciated!
     
    Marshal
    Posts: 26310
    80
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    It's just a calculation, broken into three separate steps. Here's another way of writing it:



    It doesn't really make any difference at all if you change the 7 to 11 or to pretty much anything else.

    As for your subclassing idea: The rule is "If two objects are equal then they must have the same hashcode." So if two objects are in different classes then they should not be equal, and therefore it doesn't matter if their hashcodes are equal or not. Remember that it's possible for two objects in the same class which are not equal to have the same hashcode.

    And in practice unless your program is going to be spending a large fraction of its time putting things into a Map and getting them out again, it's a waste of your time fiddling around with hashing algorithms. If you've got code written by somebody who seems to know what they are talking about, then it's fine. But there's a method in the standard Java API which takes care of that for you:



    This was written by people who do know what they are doing and it produces a value you can return from your class's hashCode() method. I'd recommend just using that.
     
    Stephen Bell
    Ranch Hand
    Posts: 55
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Thanks Paul, appreciate your reply.

    I realise that 2 distinct objects can return the same hash code and that this doesn't break the hashcode contract (as I stated above).  But say we had an Animal class and Dog and Bird subclasses, then, it's perfectly possible (and valid) for a Dog and a Bird to return the same hashcode... and if they do they will be placed into the same HashMap bucket.  Here, we could use different initial values to further reduce the possibility of a collision, I mean we know a Dog and a Bird can't possibly be the same object so I guess there's no harm in trying to get them to hash to different buckets!)... and this is the only reason I can think of for the existence of that initial value (although to be honest, we could just as easily return 'super.hashCode() * someArbitaryValue;' from the sublasses too!)

    I keep seeing this quote all over the internet...

    A nonzero initial value is used in step 1, so the hash value will be affected by initial fields whose hash value, as computed in step 2.a, is zero. If zero was used as the initial value in step 1, the overall hash value would be unaffected by any such initial fields, which could increase collisions. The value 17 is arbitrary



    (Where 17 is 7 in my example).  I understand the first part here, but what I don't get is how this in any way reduces the possibility of collisions (apart from the use-case stated).

    While I do understand that we can just ignore it, there clearly is a reason why this initial value exists, I'm simply trying to find that reason!!

    Thanks again for the reply

     
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Stephen Bell wrote:. . . what I don't get is how this in any way reduces the possibility of collisions (apart from the use-case stated). . . .

    Nor do I. Not even in the use case quoted. Please show us one of the links you found, so we can have a look at it. Though I am sure it can't do any harm if you start a hash code from a prime number. 7 is, by the way, a Mersenne prime.
     
    Stephen Bell
    Ranch Hand
    Posts: 55
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Here is one of many

    https://courses.cs.ut.ee/2017/OOP/spring/Main/HashCode

    It appears that this quote is taken from the book 'effective Java' by Joshua Bloch.

    Re the subclassing example I gave, just to clarify.  If we have Animal (superclass) then a Dog and let's say this time a Cat class (subclasses extending the Animal superclass), all of the classes take 2 parameters, 'int noOfLegs and int age' in their constructors.  The equals() method has been overridden using both 'noOfLegs' and 'age' to determine equality, and the hashCode() method has been overridden using the code in the initial post with our 2 parameters.  Now suppose we did the following...



    Now, obviously calling



    would return false, as they are clearly not 'equal' (because their runtime classes are different).  However, calling:



    Would result (assuming both classes are using the hashCode() method from their superclass and haven't yet implemented their own), in both returning the same hashcode and therefore being shoved into the same bucket in our HashMap.

    However if we override hashcode() in each of the subclasses but this time used a distinct initial value, then calling hashCode() as above would yield different values, and therefore we would have less chance of the 2 objects going into the same bucket.

    Like I say, it's just an example and I only mention it in the context of trying to understand this initial value (I don't think this is what is being alluded to in the book, but it's the only real example I can come up with where the initial value would make any difference).

    Cheers
     
    Master Rancher
    Posts: 3761
    49
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    For a class with a fixed number of fields, I'm not sure there's a big benefit to having a nonzero initial value of hash.  However, this is the algorithm also used for Objects.hash(Object..) and Arrays.hashCode(Object[]) as well... and in those more general use cases, a nonzero initial value is helpful for differentiating initial nulls or zeroes in the array.  Here's some code that shows the problem.  The hash() method is equivalent to the standard Objects.hash() method, while altHash1() and altHash2() are two slightly different implementations that are functionally equivalent, setting the initial value to 0 rather than 1.  If you run the main method, you can see that for the last five test cases, both altHash1 and altHash2 give the same result for all five.  That, I believe, is what they want to avoid by using this algorithm.
     
    Stephen Bell
    Ranch Hand
    Posts: 55
    2
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Thanks Mike,

    This is just what I was looking for, an actual code example showing the problem that this initial value fixes.

    Cheers



     
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    If you want an exhaustive guide to hashing, the authoritative reference is Donald Knuth's The Art of Computer Programming - Volume 3 Sorting and Searcching.

    There's also a concept known as a "perfect hash" where for a given hash table, every slot is used and there are no overflows. That's optimally efficient in terms of space and time, although realistically, it's only going to be true for an immutable table of constants. Say, for example, if you want a quick way to check to see if a word is a SQL (or Java!) reserved word.

    If you know your data, you can fine-tune hash functions and hashtable sizes for optimal storage and access. That will ensure that you don't end up with a lot of empty cells and a few that form long overflow chains. Prime numbers are good for that, but really anything that ensures an even distribution will do, including semi-numerical algorithms.
     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    How up to date is Knuth?
     
    Tim Holloway
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:How up to date is Knuth?



    Knuth's books aren't that kind of publication. He deliberately abstracted the coding examples and most of the important stuff isn't so much original work as it is distillation and explication of basic research done by many people up until that time. in one handy reference. And while that time is only about 1960, it covers most of the bedrock principles that we use to this day. As far as I can recall, for example, every major sorting algorithm in use on present-day platforms is based on one of the ones enumerated in Knuth. Same for hashing. And it was rigorously assembled - Knuth was famous for this "bug bounty".

    Knuth's work pre-dates Structured Programming, and doesn't account for even the primitive forms of Object-oriented Programming in play at the time, but the algorithms presented are in their basic form, so those advances are generally not material for his work.

    It also doesn't address multi-processing, content-addressable memory or (of course) quantum computing, but nevertheless, you'd find it pretty hard to write most programs of any significance without using one or more of the algorithms in Knuth. Likewise, many modern-day Computer Science textbooks will still reference Knuth.

    The one awful thing about Knuth's works (other than that he promised us 7 volumes and only delivered 3 or so) is his choice of language for code samples. The languages of the day were much more varied in features and syntax than most popular modern-day languages are, so he concocted an "assembly language" called MIX so as to not favor any particular platform. To run MIX, you needed a MIX Virtual Machine, since no actual physical hardware implemented that particular instruction set.

    Any assembly language is an abomination for illustrating algorithms. Hardly anyone uses assembler anymore and it's not the easiest thing to mentally parse even for those of us who spent years doing so for a living. Fortunately, Knuth also included lots of helpful illustrations.

    But if there's one thing I wish he'd done different, it would have been that he'd done like Edsger Djikstra did nearly a decade later. Djikstra's portable code was an Algol-like language and thus more aligned with C, Pascal, Java, Ruby, and even Python. Thus easier to convert to practical code (and to comprehend).

    On the other hand, Knuth is so fundamental that many of "his" algorithms are now realized in the standard libraries of virtually all modern programming languages where most people take advantage of them without knowing where they originated.
     
    Stephen Bell
    Ranch Hand
    Posts: 55
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    That sounds interesting Tim.  Thanks for the information, much appreciated.
     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    What an answer

    Tim Holloway wrote:. . . Knuth's work pre-dates Structured Programming, and doesn't account for even the primitive forms of Object-oriented Programming in play at the time . . . .

    I always thought OO programming started with Simula, which was seven years after Knuth.

    So there aren't any new hashing algorithms since then? And questions like whether there is or isn't an initial value aren't relevant to discussing Knuth's work?
     
    Tim Holloway
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:What an answer

    Tim Holloway wrote:. . . Knuth's work pre-dates Structured Programming, and doesn't account for even the primitive forms of Object-oriented Programming in play at the time . . . .

    I always thought OO programming started with Simula, which was seven years after Knuth.

    So there aren't any new hashing algorithms since then? And questions like whether there is or isn't an initial value aren't relevant to discussing Knuth's work?



    I was a little surprised when I looked at my books and they were copyrighted 1968 and later. Although the work began in 1962. And, as I said, draws heavily on papers published in the 1950s, which was a fertile era. Still, what I said about the lack of a "universal" teaching language was as true then as in 1960.

    Alan (Smalltalk) Kay credits a lot of his inspiration for OOP from LISP, which does, indeed date to the 1950s. MIT is recorded as been playing with OOP in 1960. Our local library once had a book by Ivan Flores that discussed early programming languages, and probably would have some interesting stuff that would have been written around the same time as Knuth, but alas, I can't even find the title anymore. Speaking of OOP, my 1975 book on the structure and design of programming languages from what was to have been a premier series on Computer Science doesn't mention OOP at all!

    New hashing algorithms are being invented all the time. Knuth, as I said, discussed the theory. And if 43 pages on hashing were not enough, one could also obtain much useful information from his extensive treatment of how to "proof" random-number generation schemes in Volume 2. For the hash to be optimal, it is important to construct it to take into account any statistical irregularities in the inputs. The stock Java hash algorithm is, as they say, "close enough for government work", but is unlikely to be truly optimal for a given set of data.


    But as regards the original question:

    The original hash constant is a starting point. Since all of the subsequent calculations are based on multiplication, its primary function is to ensure that the hash won't simply consist of a chain of multiplications by zero and thus result in 0 if the initial value was 0.

    Both the original hash value and the hash multiplier are prime numbers because the product of primes has only one solution whereas the products of non-primes has multiple solutions, Meaning a higher chance for duplicate (colliding) hash values.

    There does seem to be something missing from that hash function, however. As I recall, there should be a modulus computed, since as an index into a fixed-sized hashtable it is necessary to ensure that the slot that the hash addresses really exists.

     
    Mike Simmons
    Master Rancher
    Posts: 3761
    49
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:There does seem to be something missing from that hash function, however. As I recall, there should be a modulus computed, since as an index into a fixed-sized hashtable it is necessary to ensure that the slot that the hash addresses really exists.


    Well, there's an implicit modulus of 2^32, size of an int.  But also, within HashMap and similar classes, they take the hash from hashCode() and hash it further, including a modulus for the current size of the hash table.
     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    MS is correct. They do some right‑shifting and XOR‑ing, something like this:-Remember the shift operator has a higher precedence than XOR. I think the more recent implementations only do one shift, by 16 bits. They do a bitwise AND with the maximum array index:-Remember the subtraction operator has a higher precedence than AND. If the hash algorithm works well, that bitwise AND technique gives an even distribution throughout the array indices' range iff the capacity of the array is an exact power of two (including a one‑element array since 2⁰ = 1). But, as MS said, all this bitwise jiggery‑pokery is usually done by the Map object, not the key.
     
    Tim Holloway
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I think HashMap has changed once or twice since its initial inception. I recall that a while back I looked at the source and it had special-case logic for map sizes that were powers of 2 or something like that. The current oper-source implementation has a lot of TreeMap logic that I don't recall seeing back then. And I think that perhaps the original implementation had neither of the above and depended almost entirely on prime-number sized maps for default optimal performance. Although that's going back too far for my muzzy memory to be sure of.

    But yes, thanks for the reminder. The java.lang.Object hashCode() method is generic. Hashtables need hash methods specific to their map architecture. The Object hashCode() method is a convenient starting point, but the hash() method within java.util.HashMap is final and thus cannot be customised. You can, however, override hashing by doing custom get()/put() methods, though you need to be. aware of what that will do to the underlying assumptions.

    In short, HashMap is a very specific hashtable implementation and has certain assumptions and constraints which may not be ideal in all cases. For such instances, you'd have to provide your own custom implementation of the Map interface. Or at least do significant method overrides. And that unless the implementation spec definitely says so, all internal optimisation and organisation algorithms used are dependent on the whims of the library implementer. Only the external contract is guaranteed.
     
    Tim Holloway
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Post Scriptum: The classic hashtable consists of a static set of hash slots. The HashMap is self-tuning, meaning that, like ArrayList, it can transparently expand its slot capacity. Transparent in terms of functionality, that is. The cost being that you can potentially take an unexpected performance time hit on any random put() operation. And even trigger an OutOfMemoryException!
     
    Paul Clapham
    Marshal
    Posts: 26310
    80
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Stephen Bell wrote:This is just what I was looking for, an actual code example showing the problem that this initial value fixes.



    I don't think it actually shows any problem, it just shows what happens if you do or don't use it. As I understand what Mike wrote, if you have an object with no attributes which can be used to construct a hash code, then its hash code must be a constant. The "initial value" prevents that constant from being zero.

    I don't really see a problem with having objects whose hash code is zero; it's possible for String objects to have zero as their hash code, for example. But maybe there's some other problem which using 7 rather than 0 for the hash code of an attribute-free object is the solution to.
     
    Mike Simmons
    Master Rancher
    Posts: 3761
    49
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Well, a zero hash code is also used for null.  So in cases where you want to distinguish between a null, and an object with a single field with default value 0, it's nice if the latter evaluates to something nonzero.
     
    Mike Simmons
    Master Rancher
    Posts: 3761
    49
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Re: earlier comments on hash table / hash map evolution, yes I do recall that they used to have prime sizes, and switched at some point to powers of two.  I think that was to allow faster division and modulus arithmetic.  And then they also added more complex re-hashing because they'd had problems with some values leading to collisions too often... but the current code I see looks simpler than what they originally added:

    This is from HashMap.java in JDK 15.
     
    Paul Clapham
    Marshal
    Posts: 26310
    80
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mike Simmons wrote:Well, a zero hash code is also used for null.  So in cases where you want to distinguish between a null, and an object with a single field with default value 0, it's nice if the latter evaluates to something nonzero.



    We're down to the tiniest of issues here, but you have to write the Java API to be used by everybody everywhere, so that could possibly make a difference to somebody.
     
    Paul Clapham
    Marshal
    Posts: 26310
    80
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mike Simmons wrote:Re: earlier comments on hash table / hash map evolution, yes I do recall that they used to have prime sizes, and switched at some point to powers of two.



    And if you're using powers of two then you don't want your hash-code calculation to produce more even numbers than odd numbers. That, I suppose, is why the multiplications by 31 and other odd primes are in many published versions.

    On the other hand if your number of buckets is a prime number p then that doesn't matter. It's only an issue if your hash-codes happen to be frequently divisible by p, which is not likely.

    But prime numbers are harder to find than powers of two, and dividing by a prime number is slower than dividing by a power of two, so you can see why that switch happened.
     
    Mike Simmons
    Master Rancher
    Posts: 3761
    49
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:We're down to the tiniest of issues here, but you have to write the Java API to be used by everybody everywhere, so that could possibly make a difference to somebody.


    Agreed.  One of the takeaways from Effective Java for me was that apparently minor defects in code can turn out to be big problems for somebody, if your library is widely used enough.
     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mike Simmons wrote:. . . the current code . . . simpler than what they originally added . . .

    Yes,, they used to use a more complicated rehashing technique. They also used to put successive Entries in the same bucket into a linked list. Now they change that List to a sort of red‑black tree if its size is over eight.
     
    Tim Holloway
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:

    Mike Simmons wrote:. . . the current code . . . simpler than what they originally added . . .

    Yes,, they used to use a more complicated rehashing technique. They also used to put successive Entries in the same bucket into a linked list. Now they change that List to a sort of red‑black tree if its size is over eight.



    Which means it's not really a Hashtable in the traditional sense, it's something more complex.
     
    Marshal
    Posts: 7937
    548
    Mac OS X VI Editor BSD Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:They also used to put successive Entries in the same bucket into a linked list. Now they change that List to a sort of red‑black tree if its size is over eight.


    Just for the record. I think that happened in Java 7. Since then worst case for get() operation became O(logN) as opposed to O(n).
     
    Tim Holloway
    Saloon Keeper
    Posts: 23060
    157
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Liutauras Vilda wrote:

    Campbell Ritchie wrote:They also used to put successive Entries in the same bucket into a linked list. Now they change that List to a sort of red‑black tree if its size is over eight.


    Just for the record. I think that happened in Java 7. Since then worst case for get() operation became O(logN) as opposed to O(n).



    And the worst case for put???

    Note that the worst-case get for a Perfect Hash is O(1). All this fancy re-balancing is using the run-time as a substitute for an optimal initial allocation.
     
    Liutauras Vilda
    Marshal
    Posts: 7937
    548
    Mac OS X VI Editor BSD Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:Note that the worst-case get for a Perfect Hash is O(1)


    That's technically impossible given hash code is an int value.

    Tim Holloway wrote:And the worst case for put???


    Same O(logN) I'd think.

     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I thought put() runs in amortised constant time.
     
    Paul Clapham
    Marshal
    Posts: 26310
    80
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Liutauras Vilda wrote:

    Tim Holloway wrote:Note that the worst-case get for a Perfect Hash is O(1)


    That's technically impossible given hash code is an int value.



    No, really, a perfect hash function can only be constructed if you know all of the keys in advance. Then the function is constructed so that it maps each of the N keys to each of a set of N distinct integers.
     
    Liutauras Vilda
    Marshal
    Posts: 7937
    548
    Mac OS X VI Editor BSD Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:No, really, a perfect hash function can only be constructed if you know all of the keys in advance. Then the function is constructed so that it maps each of the N keys to each of a set of N distinct integers.


    Oh I see, I slightly misread what was said there. My apologies. I understood it as "... but if the hash function was well implemented..", although I haven't heard this term for a while and forgot so couldn't re-assemble an actual thought.

    Campbell Ritchie wrote:I thought put() runs in amortised constant time.


    But that would be an average, right? Which I'd agree is correct.
     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Yes, amortisation means averaging.
     
    Stephen Bell
    Ranch Hand
    Posts: 55
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:I thought put() runs in amortised constant time.



    Interesting, as I checked back to ask just this.  I'm trying to get my head around the time complexities for put and get operations of a HashMap and my understanding thus far is:

    get operations are 'expected O(1)' because with good hashing, most get operations would be expected to complete in O(1) time but could be O(log n) if multiple elements are in the bucket and are stored in a balanced tree, or log(n) if the elements are stored in a linked list.  So, expected O(1) with worst case O(n).

    put operations are 'amortised O(1)' because put operations where the load factor has not yet been exceeded, would again be expected to be O(1) (with similar reasoning to the 'get' above operations above), with this rising to O(n) when the load factor is exceeded and re-hashing is required.

    Am I on the right track with these assumptions?


    Thanks :-)
     
    Campbell Ritchie
    Marshal
    Posts: 71760
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    For having a question mentioned in the December 2020 CodeRanch Journal, congratulations: this question earns you a cow
     
    Maybe he went home and went to bed. And took this tiny ad with him:
    Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    reply
      Bookmark Topic Watch Topic
    • New Topic