• Post Reply Bookmark Topic Watch Topic
  • New Topic

Hash Tables  RSS feed

 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

In the link below,

https://www.youtube.com/watch?v=UPo-M8bzRrc

between 7:30 and 12:00, the Professor explains how a hash table can be used to store a dictionary. He starts off by saying that the total possible number of words is 26 to the power 45, and we can't possibly have an array of that size. So, we need hash tables.

Then he defines 'n' -> number of actual words stored and 'N' -> number of buckets.

Now, what I don't understand is whether all possible words (26 power 45) could be stored in this hash table. What is it that has made it possible? I thought initially when he said it's possible, that there is just not enough memory to accommodate that big a number.

Thanks,
Prasanna
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 37462
537
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A hash table only stores the words that actually show up. (vs an array which needs space for everything that might occur.) Since a hash table can have multiple entires in a bucket, you can have it grows as big as you want - until you run out of memory of course.
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, Jeanne. I don't understand this statement: "A hash table only stores the words that actually show up. (vs an array which needs space for everything that might occur.)" Could you please elaborate?
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 37462
537
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Forget about words for a minute. Suppose I want to keep track of whether I have all the numbers from 0 to 999. In an array, I'd have an array of size 1000. The index would be the # and the value would be whether it was found. In a hashmap, the key would be the #. If I only have one #, I only need one element in my map. A lot less than 1000!
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think the statement, "we can't possibly have an array of that size, so we need hash tables," is misleading, at least in the context of how most people view hash tables. In general it takes more space to store stuff in a hash table than in an array. We don't use hash tables to save space. We use them to save time. It's much faster to look something up in a hash table than in an array, because the hashing function lets us split the total possible domain space up into smaller pieces--buckets. We only have to search through the bucket that contains other entries with the same has value as what we're looking for, vs. starting at the beginning and looking at every element in an array.

Where the instructor is coming from--and what Jeanne is explaining--is if we start from the perspective of, "I'm not going to search through an array. I'll just go right to the index. I don't need a hash table." In that case, we imagine an array with every possible word that can exist, up to some number of letters. Blank-blank-blank-...-blank-A is index 0. Blank-blank-blank-...-blank-B is index 1. And so on. It's easy to compute the index from the characters. Characters are just numbers after all. So then we go right to the index in the array corresponding to the numerical value of the word we're looking up.

Hash tables are normally taught as a faster alternative to a linear search through an array or list that's just big enough to hold the words we're actually dealing with. This instructor is just teaching it as a less memory hungry alternative to a fast indexed lookup in an array.
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Jeanne and Jeff. Please bear with me, I still don't get the point.

Let me go back to what got me confused in the first place. He first spoke about a dictionary of all possible 2 letter words - so an array of 675 indices, each holding a word.

Now he talks about expanding the dictionary to all possible English words - 26^45. Now, using the above logic, we would need to declare an array of size 26^45, which he said isn't possible. Then, he goes on to explain hash tables. I don't understand the connection here - what is it that the hash table makes it possible?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:
Now he talks about expanding the dictionary to all possible English words - 26^45. Now, using the above logic, we would need to declare an array of size 26^45, which he said isn't possible. Then, he goes on to explain hash tables. I don't understand the connection here - what is it that the hash table makes it possible?


We still can't store 26^45 words in a hash table. No matter what data structure we use, the most we'll be able to store is some millions, or maybe a couple billion words. We can store just as many with an array (or even more) as we can with a hash table, but by making an array that only has as many indices as the number of words we're storing, we lose the ability to treat a word as a number and go directly to the one we want.

With a hash table, we can still get fast lookup, even without reserving spots for every possible word. With an array, if we want fast lookup, the array has to be big enough to hold every possible value, not just the one's we're actually storing.
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"With a hash table, we can still get fast look-up, even without reserving spots for every possible word. With an array, if we want fast look-up, the array has to be big enough to hold every possible value, not just the one's we're actually storing."

This is the same thing the Professor mentioned. I somehow still don't understand this. Here how I am confusing this: He said there'd be about 700 million words in the language, so we define N (number of buckets) to be about 700 million or slightly larger. So, are we not allocating the size in a hash table also?

Also, in an array, why do we have to reserve a place for all possible words in order to achieve constant time performance?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:He said there'd be about 700 million words in the language, so we define N (number of buckets) to be about 700 million or slightly larger. So, are we not allocating the size in a hash table also?


No. We don't create as many buckets as there are entries. The more buckets there are, the faster the average lookup time, but a hash table can work with just a single bucket. In that case, the lookup time is the same as a linear search through an array. For the case of 700 million words, if we have 1,000 buckets, and a good hash function, then on average, we'll only have to search a 700,000 word list.

Also, in an array, why do we have to reserve a place for all possible words in order to achieve constant time performance?

We can access a particular element in an array in constant time by using its index. If we want to find the word "java", we calculate its index as 10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) and go to that index. If we don't have spots reserved for every possible word, we can't do that. We don't know what word is in what position, so we have to search for the word we want.

If I say, "Here's a 1,000-element array. Where is the word "hash" in that array?" what would you do? How would you jump directly to the index where "hash" is stored?

(Note that if the array is sorted, even though we can't get constant lookup time, we can get logarithmic lookup time, rather than linear.)
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you. I think I may have got it now! Will think over this.
 
Campbell Ritchie
Marshal
Posts: 56529
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote: . . . the total possible number of words is 26 to the power 45, . . .
26⁴⁵ means that you can have words 45 letters long, using the 26 letters of the English alphabet. It assumes that every letter can be followed by every other letter, and does not allow for words less than 45 letters long! So it is a theoretical overestimate!

… And overestimates by many orders of magnitude. 26⁴⁵ = 4.718 × 10⁶³ When I was young there were 300000 words in the English language; there are probably a few million now. The nice thing is, you can create a 1000000‑element hash map and put all those words in it, along with their meanings. And, as you have already been told, you can look up those words very quickly.
 
Campbell Ritchie
Marshal
Posts: 56529
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeff Verdegan wrote: . . . If we want to find the word "java", we calculate its index as 10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) and go to that index. . . .
In an hash collection, it is a bit more complicated than that. You calculate all that, and then take its remainder when divided by the size of the collection.

If you want some of the more advanced details, read this thread and this one, but you might find them a bit complicated.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:
Jeff Verdegan wrote: . . . If we want to find the word "java", we calculate its index as 10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) and go to that index. . . .
In an hash collection, it is a bit more complicated than that.


I wasn't talking about a hash collection. I was talking about an array of 26^45 words, for allowing O(1) indexed lookup of any possible word by treating each word as a number.
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Jeff and Campbell.

"We can access a particular element in an array in constant time by using its index. If we want to find the word "java", we calculate its index as 10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) and go to that index. If we don't have spots reserved for every possible word, we can't do that. We don't know what word is in what position, so we have to search for the word we want. "

So Jeff, I think I understand what you're trying to say here. Does the above mean that practically we can't use arrays to store a dictionary? If we wanted to, we would have to make space for all possible words in order to achieve constant time performance, correct?

"If I say, "Here's a 1,000-element array. Where is the word "hash" in that array?" what would you do? How would you jump directly to the index where "hash" is stored?"

Did you use the above statement to illustrate the point that we need to reserve memory for all possible words? If so, I think I understand this. The user may want to look up any random word that is not defined, but you still have to hash it to some location and find that it's null in order to get O(1), right?



 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeff Verdegan wrote:I wasn't talking about a hash collection. I was talking about an array of 26^45 words, for allowing O(1) indexed lookup of any possible word by treating each word as a number.

Oddly enough, I'm working on exactly that problem at the moment (well, something very similar): an unlimited sparse array with space roughly (very roughly ) proportional to amount of content. Location is admittedly O(log₃₂indexSize), but each "tier" is a direct look-up (no searches involved) and full iteration is O(n).

Winston
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:Thank you Jeff and Campbell.

"We can access a particular element in an array in constant time by using its index. If we want to find the word "java", we calculate its index as 10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) and go to that index. If we don't have spots reserved for every possible word, we can't do that. We don't know what word is in what position, so we have to search for the word we want. "

So Jeff, I think I understand what you're trying to say here. Does the above mean that practically we can't use arrays to store a dictionary?


We could, but lookup would be slow.

If we wanted to, we would have to make space for all possible words in order to achieve constant time performance, correct?


Yes.

"If I say, "Here's a 1,000-element array. Where is the word "hash" in that array?" what would you do? How would you jump directly to the index where "hash" is stored?"


Did you use the above statement to illustrate the point that we need to reserve memory for all possible words?


Yes. If you know the array is indexed by all possible words, using the polynomial I used for the "java" example above, then you can get to any word's index in O(1). If we don't have that precondition, then we don't know, and can't calculate, what index a given word will be at, so we have to search for it.

If so, I think I understand this. The user may want to look up any random word that is not defined, but you still have to hash it to some location and find that it's null in order to get O(1), right?


If it's an array of all possible words, I wouldn't call it hashing, but yes, to find the definition for a word--or for that matter to determine that there is no such word (definition == null)--we need to be able to easily calculate where to find that word. Given the more or less random distribution of "real" words over the set of all possible words, there's no way to calculate the index of a real word in O(1) time in an array that contains only those real words.

 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, Jeff.

"We could, but lookup would be slow."

How do we do the above? And, could you please explain how the look-up is slow?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:Thank you, Jeff.

"We could, but lookup would be slow."

How do we do the above? And, could you please explain how the look-up is slow?


It would be a linear search from the beginning of the array. Finding a given word would be O(N). Or, if the array is sorted, we could use binary search, and it would be O(log N). A HashMap lookup, on the other hand, is an "amortized" O(1).
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, Jeff. My question was not clear. I meant to ask, how does the look up become O(n). That is, how is the elements stored in the array? I am trying to understand how the array will look like.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:Sorry, Jeff. My question was not clear. I meant to ask, how does the look up become O(n). That is, how is the elements stored in the array? I am trying to understand how the array will look like.


Okay, so, we're talking about an array that only holds real words, and is just big enough to hold all real words.

Depending on how you count it, there are something like 200,000 to 1,000,000 words in the English language. So let's say we have an array with 200,000 elements, each one holding a real word.

Now we want to find the word "java". How do we do it? We look through the array, from the beginning, and stop when we find the word, or when we reach the end of the array without having found it. That's an O(N) operation. If the array is size N, then, on average, we will have to search through N/2 elements. Multiplicative constants are generally ignored when calculating big-O notation, so O(N/2) is the same as O(N).
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh, OK! I think I now understand clearly. So if we have an array just big enough to hold all the words, then the look up becomes O(n) because we don't hash the key, we just put the words with definitions.

The alternative, if we could afford so much space, would be to declare an array big enough to hold all possible words; so we could hash each word to its location, and then do an O(1) look up.

Is my understanding right?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:Oh, OK! I think I now understand clearly. So if we have an array just big enough to hold all the words, then the look up becomes O(n) because we don't hash the key, we just put the words with definitions.

The alternative, if we could afford so much space, would be to declare an array big enough to hold all possible words; so we could hash each word to its location, and then do an O(1) look up.

Is my understanding right?


Yep. That's about the size of it. Except, again, it's not really a "hash" in the case of all possible words, just a direct lookup treating the word itself as a base-26 number. (Which could be viewed as a very simple, direct hash, I suppose.)
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) - Is this treating the word as a base-26 number that you're referring to?

And, does the above provide unique locations for all possible words?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Prasanna Raman wrote:10*26^3 (j) + 1*26^2 (a) + 22*26^1 (v) + 1*26^0 (a) - Is this treating the word as a base-26 number that you're referring to?


Yes.

And, does the above provide unique locations for all possible words?


Yes. In the same way that the sequence of digits 6789 is distinct from every other base-10 number (if we don't allow leading zeros).
 
Prasanna Raman
Ranch Hand
Posts: 410
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, thank you very much Jeff! I understand this clearly now!
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're very welcome! I'm glad it's clear now.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!