# Hashmap operational upper bounds.

Paul Yule

Ranch Hand

Posts: 230

posted 8 years ago

I have been confused about this for sometime in how this actually works. From my understanding of what I've been taught Hashmaps have a magnitude of O(1) when searching for data. I can't quite wrap my head around how a data structure can do this.

If you have an array of items that are sorted {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} the quickest search that can be performed is at O(log n) starting at the middle and halving the entries every time.

How does a hash map work in a way that when it gets a key to search for it instantly jumps to the correct spot without any knowledge of what is contained inside it and without checking to see if it is the correct key.

If you have an array of items that are sorted {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} the quickest search that can be performed is at O(log n) starting at the middle and halving the entries every time.

How does a hash map work in a way that when it gets a key to search for it instantly jumps to the correct spot without any knowledge of what is contained inside it and without checking to see if it is the correct key.

posted 8 years ago

Accessing an indexed element of an array is O(1), right? You just go straight to the numbered element. There's no searching involved.

A hash map works by mapping every possible object onto a particular element in an array using a "hashing function." In Java, the hashing function is generally based on the Object.hashCode() method. For example, the expression

Math.abs(theObject.hashCode() % 100)

maps every possible object onto a number between 0 and 99, inclusive.

So you create a 100-element array, and when you need to store an object, you evaluate the above expression, and you know which array element to put it in. Since it's entirely likely that more than one object will map to the same number, you put a linked list into each array element, and store the objects into those linked lists (you can be lazy about initializing them, of course.)

If you need to look up an object, you evaluate the same expression, and search the appropriate linked list, using Object.equals() to choose between the items in the list.

If the linked lists start to get too full -- i.e., if that "looking-up" part starts to get slow -- you "rehash", which means choose a larger number instead of 100, and rebuild the map so that the lists are short enough again. The maximum allowable "fullness" is usually called the

Java's HashMap and HashTable use one object as the key (which provides the hashCode() method) and another as the value (which is presumably the "interesting" object.) You do the looking up with the key, and then in the linked lists you store key/value pairs (in HashMap, it's an inner interface called Map.Entry that holds the pairs).

Make sense?

A hash map works by mapping every possible object onto a particular element in an array using a "hashing function." In Java, the hashing function is generally based on the Object.hashCode() method. For example, the expression

Math.abs(theObject.hashCode() % 100)

maps every possible object onto a number between 0 and 99, inclusive.

So you create a 100-element array, and when you need to store an object, you evaluate the above expression, and you know which array element to put it in. Since it's entirely likely that more than one object will map to the same number, you put a linked list into each array element, and store the objects into those linked lists (you can be lazy about initializing them, of course.)

If you need to look up an object, you evaluate the same expression, and search the appropriate linked list, using Object.equals() to choose between the items in the list.

If the linked lists start to get too full -- i.e., if that "looking-up" part starts to get slow -- you "rehash", which means choose a larger number instead of 100, and rebuild the map so that the lists are short enough again. The maximum allowable "fullness" is usually called the

*load factor*.Java's HashMap and HashTable use one object as the key (which provides the hashCode() method) and another as the value (which is presumably the "interesting" object.) You do the looking up with the key, and then in the linked lists you store key/value pairs (in HashMap, it's an inner interface called Map.Entry that holds the pairs).

Make sense?

Paul Yule

Ranch Hand

Posts: 230

posted 8 years ago

sort of...where i lose understanding is when an object is put into the hash map if 2 objects map to the same hash value when you search for either object when you get to that spot (IE object1 maps to value 55 and object2 maps to value 55) does that not increase it's worst case beyond 1? Does hashmap have a default number of values it stores whenever you create one (in our example it was 100).

posted 8 years ago

Hi Paul,

In each of the 100 array elements , there is a list. Initially all the lists are empty (in a real implementation, the lists might not even be created until there's something to put in them.)

If multiple items are mapped to the same array element, then they are all put into that list. So yes, sometimes looking up an item involves searching a list.

But as soon as any of the lists reach some maximum size, the whole structure is "rehashed," so the lists never get larger than some predetermined amount. When it's rehashed, there are more lists, all shorter than that maximum size.

Here's the key insight: the maximum lookup time is never greater then a linear search through a list of some predetermined length. That length is

In each of the 100 array elements , there is a list. Initially all the lists are empty (in a real implementation, the lists might not even be created until there's something to put in them.)

If multiple items are mapped to the same array element, then they are all put into that list. So yes, sometimes looking up an item involves searching a list.

But as soon as any of the lists reach some maximum size, the whole structure is "rehashed," so the lists never get larger than some predetermined amount. When it's rehashed, there are more lists, all shorter than that maximum size.

Here's the key insight: the maximum lookup time is never greater then a linear search through a list of some predetermined length. That length is

**not**proportional to the number of elements in the whole hash table. Therefore in the limit of large hash tables, access is O(1) or "constant time", which means, by defintion, that the time does not go up when the hash table gets larger. It's fixed, even though there's a linear search involved.