• Post Reply Bookmark Topic Watch Topic
  • New Topic

Help with Populating a Hash Table using ArrayList  RSS feed

 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello, I'm new here... and I'm still learning the basics of Java. I was wondering if someone could help me?

I'm having a problem populating a LinkedHashTable <Integer, ArrayList>.

What I'm wanting is to have

Key #1 : word1, word2, word3, word4
Key #2: word1, word2, word3, word4
...
and so on.

Instead, I am getting

Key #1: word1, [word2], [word3], [word4]
Key #2: word1, [word2], [word3], [word4]

I think whenever an additional item is being added to the ArrayList, it adds itself as its own list! (hence the brackets)

Here is my code:



I believe my problem is with the way I'm trying to add the words to the key if the key already exists, because the first entry is always a string but after that it's not.

The whole point of this program is to build a inverted index. My first step is to create this hash table that associates each document # with the words that are contained in the document. After that I plan to use that in order to populate a second hash table that will be <String, ArrayList> to store the words as keys and associate them with the documents that they can be found in.

Any help would be greatly appreciated... Thanks.

 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch
Do you really mean LinkedHashTable? LinkedHashMap, surely?
I am not sure I understand the problem. Are you trying to put a List<something> as a “V”? How many Lists have you got? When you have variable names like temp3 and temp5, that suggests “confusion” to me.
What I would expect to do it to create a Map<Integer, List<XYZ>>If you are trying to use the Map backwards, I can think of a few possibilities. One is to look for a bi‑directional List. Google Apache Commons; I am pretty sure they have one ready for use. Another would be to use two Maps, one a Map<Integer, List<XYZ>> and the other a Map<XYZ, List<Integer>>. “Put” the two objects into the maps, in one case as K‑V and in the other, the other way round.

I am sure other people will have other useful ideas, too.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote:I believe my problem is with the way I'm trying to add the words to the key if the key already exists, because the first entry is always a string but after that it's not.

OK. So let's back up.

Forget HashMap's and ArrayList's - What are you trying to do?

You've run into one of the quintessential things about the business of programming: Code will never, EVER solve your problem.

You need to understand - and be able to describe to a dummy like me - WHAT it is you want to do, before you write your first line of Java code.

So: What do you want to do; and what made you think that a HashMap was the way to solve it?

Then we'll get down to the business of what's wrong with your code.

Winston
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Sarah, welcome to Code Ranch!

First, your variable naming is atrocious! It may seem silly, but picking descriptive names for variables is probably the most important way to make your code readable, even to yourself.

Now, yes, you are creating a new ArrayList every time (temp5) and adding it to the existing list if there is one. What you want to do is only create a new list only if there isn't one already in the hash for your key, and then put it into the hash. If there is a list in the hash already, you want to get it from the hash, and add your new element to it. You don't even have to put it back in the hash, because it's already there.
 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you really mean LinkedHashTable? LinkedHashMap, surely?


Sorry, I did mean LinkedHashMap not LinkedHashTable! I've been at this all day trying to figure this stuff out so I guess my brain is fried.


Now, yes, you are creating a new ArrayList every time (temp5) and adding it to the existing list if there is one. What you want to do is only create a new list only if there isn't one already in the hash for your key, and then put it into the hash. If there is a list in the hash already, you want to get it from the hash, and add your new element to it. You don't even have to put it back in the hash, because it's already there.


YES! That is exactly what I thought... would I still use the .get() method to specify which document # key I am using?


.. and sorry guys for the poor variable naming. I know it's bad, I kept telling myself I'd clean it up later once I got this hashmap to correctly populate. I will fix it!
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote:Sorry, I did mean LinkedHashMap not LinkedHashTable! I've been at this all day trying to figure this stuff out so I guess my brain is fried.

LinkedHashMap, LinkedHashTable - it doesn't matter. You're obsessing about things that don't matter, when you need to back up and chill out.

I ask again: What are you trying to do? Explain it to me, and explain the reasons for your choice of classes
- and if you were told to use them, then tell me why you think you were told to use them.

You must understand the problem, Sarah; otherwise you'll simply get more and more frustrated.

Winston
 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK. So let's back up.

Forget HashMap's and ArrayList's - What are you trying to do?

You've run into one of the quintessential things about the business of programming: Code will never, EVER solve your problem.

You need to understand - and be able to describe to a dummy like me - WHAT it is you want to do, before you write your first line of Java code.

So: What do you want to do; and what made you think that a HashMap was the way to solve it?

Then we'll get down to the business of what's wrong with your code.



I was wanting to use a HashMap because that was the only data structure that I felt comfortable with that I thought I could use to associate what lines of text belonged to what documents.
The whole point of this program is to build a inverted index, where the hashmap reveals which documents the words can be found in.

Currently, I have a BufferedReader inside a loop that goes through each document and reads each line contained in the document. Each line goes through loops to have the numbers, periods, commas, etc removed followed by breaking up each line into tokens and storing that into a new array.
This new array will contain the j^th line from the i^th document like this:
new array = word1, word2, word3, etc...

Eventually all the lines from the i^th document will be read and each word will be added to a new HashMap that contains the words as the keys and the document #'s as the values. This new HashMap must be a <String, ArrayList> so each word will have a list of documents that they can be found in.

 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I ask again: What are you trying to do? Explain it to me, and explain the reasons for your choice of classes
- and if you were told to use them, then tell me why you think you were told to use them.

You must understand the problem, Sarah; otherwise you'll simply get more and more frustrated.


I'm trying to print a list of words with the document #'s that they can be found in.

example:
word1 = doc1, doc5, doc2
word2 = doc7, doc63, doc3

but FIRST, before I can even do that I have to get each line of text associated with what document # they came from by using a HashMap. Then, I wanted to change the ArrayLists that're associated with each document # to contain each word as a separate element, instead of separate lines. That is where I am stuck.
I do not know how to add additional words to the document # keys. I need all the words to be added as strings within the ArrayList associated with each document #... not as ArrayLists within the ArrayList.
 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I thought this might help clarify what I'm trying to do:



This is only for the first HashMap, where I am assigning the document #'s the words that they contain.
I changed all the poorly named variables to something meaningful, and added more comments....
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote:I was wanting to use a HashMap because that was the only data structure that I felt comfortable with that I thought I could use to associate what lines of text belonged to what documents.
The whole point of this program is to build a inverted index, where the hashmap reveals which documents the words can be found in.

Phew. OK.

I hope you understand that there is more than one way of doing this, but your choice of structure seems quite reasonable.
However, assuming that a "key" is a word, and your document ID is an integer, then:

HashMap<String, ArrayList<Integer>>

would seem like a better choice.

I'll be around for an hour or so if you have questions.

Winston
 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HashMap<String, ArrayList<Integer>>

would seem like a better choice.


I didn't know you could do that. Thanks so much! I applied that same logic with the first HashMap <Integer, ArrayList<String>> and got it to populate correctly.

Now for my second HashMap.. I have encountered a error and I don't know why it is happening. The second HashMap is <String, ArrayList<Integer>> to store the words with their associated document #'s. The words from the first document correctly receive '0' but once a word that was in document #0 appears in document #1 something weird happens. The document number for the second document (in this case, 1) gets added to all the words in document #0.

So my output is:

Word 1 = (0,1) // these words actually DO NOT exist in document #1.
Word 2 = (0,1)
Word 3 = (0, 1)
...
and so on.

I traced through the loops multiple times with the debugger, and I still can't figure out what's causing it. Any ideas?



Thanks again for all your help
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote: . . . the poor variable naming. . . . clean it up later . . .
Will you ever clean it up later? Get it right first time and it will be a lot quicker.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote:Now for my second HashMap. I have encountered a error and I don't know why it is happening.
I traced through the loops multiple times with the debugger, and I still can't figure out what's causing it. Any ideas?

As I believe Campbell already pointed out, you have an AWFUL lot of code in one place (your main() method),
and you seem to have developed it in a rather "stream of consciousness" manner, which is always a recipe for trouble.
You're tackling things in a straight line, rather than trying to see if you can arrange them.

My first advice would be to StopCoding (←click). Back off, go down the pub (if you're old enough ),
and put some distance between you and this problem.
Right now, you're obsessing about code, and code will never solve your problem.

What is your input? It's a bunch of documents that contain words and, for each word, you need to know what document it's in.

So, what about a Word class? Perhaps something like:and set it up so that two Words are only equal() if they are the same word in the same
document
, and add a hashCode() method that is a combo of the 'word' and 'document' hashcodes.
If you're on version 7, the java.util.Objects class has a lovely hash() method for doing precisely this.

Now you have an object that holds the word/document binding that you want,
and you can create Comparators that order it either by the word itself, or by the document id.

If it was me, what I'd do then is to read in all my documents and, for each word, create a Word object
and add it to a HashSet<Word>.
Obviously, there will be plenty of times that the add fails, but at the end of that process I will have a
complete set of every word/document combination.

Now you're in a much better position to create the HashMaps that you want;
but instead of reading directly from files, read from your HashSet.

I'll leave the details up to you, but it might involve some sorting (although it doesn't have to).

HIH

Winston

PS: It's probably too late now, but your code lines are far too long, which makes your thread
very difficult to read. For future reference, the rule is:
80 characters per line
and that includes string literals AND comments AND long method calls.

 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote:
HashMap<String, ArrayList<Integer>>

would seem like a better choice.


I didn't know you could do that. . . .
I think you would end up writingSince ArrayList implements the List<E> interface, you can put an ArrayList as a “V” in a Map<Foo, List<E>>.
You know about using immutable objects as the “K” in Maps, I presume? Integer and String are both immutable, so you are OK there.
Your code still reads as confused, and your methods are inordinately long. Particularly the main method, which I think should be 1 line shorter than that FAQ shows. (But never two lines shorter ‍)

It looks like code from somebody who is overthinking, an effective way of making life difficult for yourself.

There are two ways to add things to a List “get”ted from a Map. I have already shown you one. You see whether the Map contains the key, and if so add to that List. If not, (1) create a new List (2), add to it and (3) put the list into the map as a value. You can do that in the order 132 as well.
The other way, which is slightly better, is to get the list from the map, and see whether it is null.
Start by reading about Map#get(java.lang.Object), and its implementation in HashMap, etc. Note whether and when it can or cannot return null. Read about the Map interface in the Java Tutorials, wherer there is an example rather similar to what you want.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Earlier, I wrote: . . . “get”ted from a Map. . . .
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You do realise a List will never work?
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote: . . . As I believe Campbell already pointed out, . . .
No, it wasn't until after that point that I saw the code in its full glory and realised how long it was. I thought it looked confused because of the peculiar variable names.
 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So what would you guys suggest for cleaning up my program?

What about having additional class files to hold the methods for tokenizing the lines? Or just simply removing them from main()?
That shouldn't be a problem.

Correct me if I'm wrong, but I think I figured out the problem with the second HashMap.



What I thought I was telling it to do is not what it is actually doing. :P
I wanted it to check the token to see if it already contained "i" which is the document number. But instead, it is checking the whole HashMap for tokens that do not have the document number and then adds it to them.

by going through the debugger, it appeared to me that if I could get that error straightened out I should be getting correct document numbers for all my words.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You are going to run into problems with a List. You will either get slow performance, with the app running in quadratic time when it should run in linear time, or you will get very long lists.

I think you are looking for a Set to put the numbers in.
 
Sarah Evans
Greenhorn
Posts: 8
Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Campbell, Winston, and everyone else for your very helpful tips and suggestions.

Campbell, thanks to your advice I was able to get this program working properly!!!

I even used a TreeMap to sort all the words alphabetically.
I added in a if statement to check to see if the word hasn't been already recorded in document #n to make sure there are no repeats.
I switched the HashMaps to Maps, and ArrayLists to Lists.

I'm so happy that this is finally done. I've learned SO much.

Thank yoooooou
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well done

I still think you should use a Set for the “V”s on both sides. A HashSet will give the fastest performance, and a TreeSet will sort the entries automatically. Your app will run in linear time with a HashSet and nlogn time for a TreeSet, but checking the contains() method i na List before adding will make it run in n² time.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sarah Evans wrote:I'm so happy that this is finally done. I've learned SO much.

Ain't it great? Worth all those dents in your forehead, eh? - although you young 'uns have those nice, soft flatscreens now;
I have to use my keyboard or desk for a proper flagellation these days.

Thank yoooooou

You're most welcome. And welcome to programming.

Winston
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!