• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Making Code Reusable And Outputting To Numerous Vectors

 
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey all,

I've made some code that works, but i want to make it reusable. This code reads in a dictionary object created in another class (that i made), as well as read in text file to put in said dictionary object. The code then loads in another text file, compares it with the dictionary, updates its frequency values and outputs to a vector. Everything works, but as i said i'm looking to make it reusable. What i mean by this is once the values have been stored in the vector, i'd like the dictionaries frequency values to reset back to 0, ready for the next text document to be compared with the dictionary, then output that to a different vector - this process will be repeated 10-12 times. I've been trying for hours, but whatever code i write, it overwrites the existing vector.

After some research, i believe making the code between the try/catch method a function. But the problem i'm having is that i can't access the dictionary object from within the function... The code below is what i have working so far.

Any help would be appreciated.

 
Rancher
Posts: 508
15
Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can try this.

Define "dictionary" as a static variable. Store the 10-12 file names to be processed in an ArrayList and process each file in a for-loop in a new function called processDocument or processInputFile:

 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Prasad Saya wrote:You can try this.

Define "dictionary" as a static variable. Store the 10-12 file names to be processed in an ArrayList and process each file in a for-loop in a new function called processDocument or processInputFile:



But would that not overwrite the pre-existing vector(s)? Would the dictionaries values be reset to 0 after each call? If so, would this guarantee each vector would be different? Sorry for all of the questions?  
 
Prasad Saya
Rancher
Posts: 508
15
Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Would the dictionaries values be reset to 0 after each call?



Yes. The value part of the map.

But would that not overwrite the pre-existing vector(s)? If so, would this guarantee each vector would be different?



Yes, it overwrites.

You need to create another static variable List<Vector> myVectors = ....
-and-
for each call store the vectors in that. So, at the end of processing all the 10-12 files you will be left with myVectors list with the output of those 10-12 input files/documents.
 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Prasad Saya wrote:

Would the dictionaries values be reset to 0 after each call?



Yes. The value part of the map.

But would that not overwrite the pre-existing vector(s)? If so, would this guarantee each vector would be different?



Yes, it overwrites.

You need to create another static variable List<Vector> myVectors = ....
-and-
for each call store the vectors in that. So, at the end of processing all the 10-12 files you will be left with myVectors list with the output of those 10-12 input files/documents.



I'm not sure what you mean regarding the vectors, as i want a different one for each call of the function. For example, I have created the contents of the dictionary with 5 text files. I then call compare the dictionary with text file A, which outputs its values to VectorA. I then repeat the process with text file B, which outputs its values to Vector B, etc.

I've create the function, which looks like it works so far...

 
Prasad Saya
Rancher
Posts: 508
15
Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The main method does this:

- from the original code: creates the Dictionary reader = ... using the two files Text.txt and Text2.txt (or more files)
- create a list of your 10-12 input files (A, B, C,...)
- process each file in a loop: processDocument
 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Prasad Saya wrote:The main method does this:

- from the original code: creates the Dictionary reader = ... using the two files Text.txt and Text2.txt (or more files)
- create a list of your 10-12 input files (A, B, C,...)
- process each file in a loop: processDocument



Okay, i believe that i'm almost there... Is the last for loop correct? And how would i store each file in an independent vector with just the dictionaries values?

public static void main(String[] args) {

List<String> textFileList = Arrays.asList("Test.txt", "Test2.txt");
ArrayList<String> files = new ArrayList<String>();

Dictionary reader = new Dictionary(dictionary);

for (String text : textFileList) {
reader.fileScanner(text);
}
try {
Scanner textFile = new Scanner(new File("Test3.txt", "Test4.txt"));

while(textFile.hasNext()) {
files.add(textFile.next().trim().toLowerCase());
}

textFile.close();

} catch(FileNotFoundException e){
e.printStackTrace();
}

for(String word : files) {
processDocument(word);
}
 
Master Rancher
Posts: 4465
38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This statement does not make any sense:
What do you want the code in that statement to do?
 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Norm Radder wrote:This statement does not make any sense:
What do you want the code in that statement to do?



To read in text files to compare with the dictionary... What i want is to read in a text file, compare with the dictionary, and export the values of the HashMap to a vector. And i need to repeat this process 10 - 12 times, with each call of the function producing different vectors for different text files... But i want the contents of the dictionary to remain the same, but the values to remain 0.  
 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Prasad Saya wrote:The main method does this:

- from the original code: creates the Dictionary reader = ... using the two files Text.txt and Text2.txt (or more files)
- create a list of your 10-12 input files (A, B, C,...)
- process each file in a loop: processDocument



I was just going off what was advised here
 
Norm Radder
Master Rancher
Posts: 4465
38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

To read in text files  


You need to read the API doc for the File class to see how to use it to read a file with the Scanner class.  The File class's constructor takes the name of ONE file that is to be accessed.  When passing two Strings to the constructor, the first String refers to the file's parent directory.

Note: Be sure to wrap all posted code in code tags.  To do that: select the posted code and press the Code button.
 
Prasad Saya
Rancher
Posts: 508
15
Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just used most of your code and refactored it like this (as I had explained earlier) and you may have to figure some details:

Staff note (Knute Snortum) :

Just a reminder that complete solutions are frowned upon in Beginning Java. This isn't a "complete" solution exactly, but it does a lot of things for the OP that he was asked to do himself.

 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Prasad Saya wrote:I just used most of your code and refactored it like this (as I had explained earlier) and you may have to figure some details:




Thanks, that sort of works. I've managed to get the dictionary to clear when the function is called... But what i had in my previous code that worked well was that not only did the vector store the values of the strings that are present in the dictionary and the document, but also the values of the strings not present in the document. For example, the vector in your code stores [1, 1, 1], where i'm looking for it to include t0's too, [0, 1, 1, 0, 0 , 0, 1].

Secondly, the other issue i'm having with your example is storing the values of a different comparison in a different vector? Your example overwrites the the existing vector, but i want both vectors to be usable.
 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have played with your example and almost have an output that i like...

Firstly, as you can see at the bottom of the code, i have two print statements that calls the function with the relevant text file. Is there a way that i can store the output of each function call in a variable?

Secondly, the vectors produce the correct results. However, they just include the frequency of the words within the text documents called in the function. Is there a way to include frequency of the strings not in the text document? For example, test3 returns [1, 1, 1]... But i'd like something like this: [1, 0, 0, 1, 1, 0, 0]... As you can see, it returns the values of strings that are in the dictionary, but cannot be found in Test3.

 
Marshal
Posts: 16594
277
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Seems like I'm constantly harping about the Stroop Effect this week...  Anyway, it's important to use language that's unambiguous. The term "dictionary" used in the context is confusing. First of all, your usage of that word is not common in "Java-speak". When it comes to Java, we use Map. The API Documentation for Dictionary even says the Dictionary class is obsolete and should be replaced by Map. Your usage of the data structure is really to count word frequencies, so a name like wordCounts or wordFrequencies would convey your intent better.

Same thing goes with the "vector" -- ArrayList is preferred over the Vector because the latter is synchronized and therefore has slower performance. When your code uses both, that's kind of puzzling to experienced people.

The name StringCounter is also a bit off since the program is really counting words. Why not call it WordCounter instead?

As for reusability, the smaller and more focused your classes are, the more reusable they are. Having everything in one class inhibits reusability. Moving code to different methods helps but moving code to different classes will probably give you more mileage.
 
Bartender
Posts: 4633
182
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why is dictionary a static field in Test? Why not a local field in the processDocument method? And is that Dictionary class really necessary? And, sorry for asking, why not return the HashMap, instead of only the values? That way you would also have knowledge of the Strings involved, making comparison between the several text files mich easier. But maybe (likely) I miss your intention.
 
George Gilliland
Greenhorn
Posts: 21
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Souris wrote:Why is dictionary a static field in Test? Why not a local field in the processDocument method? And is that Dictionary class really necessary? And, sorry for asking, why not return the HashMap, instead of only the values? That way you would also have knowledge of the Strings involved, making comparison between the several text files mich easier. But maybe (likely) I miss your intention.



Yes, the dictionary class is necessary, as my project supervisor wanted me to make one myself. I understand the class name 'Test' is weird, but i was just testing out code, hence the name Test. And i only need to return the values of the String frequencies, as i'm measuring similarity between documents, the Strings themselves are not important.
 
Junilu Lacar
Marshal
Posts: 16594
277
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

George Gilliland wrote:Yes, the dictionary class is necessary, as my project supervisor wanted me to make one myself.


So did your project supervisor say "You need to make a Dictionary class yourself" or "Instantiate a Dictionary and use it in your program" or "Use a data structure that works like a dictionary"? Are you sure you clearly understood his intent?

And i only need to return the values of the String frequencies, as i'm measuring similarity between documents, the Strings themselves are not important.


That doesn't make sense.  Say your first file had the words "Foo", "Qux", "Bar", "Quux", "Baz", and "Quuux" and the second file had "Baz", "Foo", "Bar", and "Blah". If I understand your requirements, then the frequencies you'd get back would be [1, 1, 1, 1, 1, 1] and [1, 1, 1, 1, 0, 0, 0] or something like that. How do you know which counts correspond to which words? How do you know the two files have any words in common at all, given just the counts?
 
Piet Souris
Bartender
Posts: 4633
182
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I wanted to ventilate some ideas, knowing (rightly or wrongly) what the intenton is. However, this topic is set to be closed? Really?
 
Prasad Saya
Rancher
Posts: 508
15
Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Closing a topic is not an end to anything really. George G. can always solve the way he perceives it. He can try to understand whats posted by the ranchers and figure the solution. May be there is another way. Sometimes pausing little bit can clear things up.

Perhaps there are many thoughts and ideas to focus on. May be its little overwhelming (it can be sometimes when there is pressure to figure a solution on time, etc.). Can't say what are the factors affecting. Over years I have come to know that the most deterring factor in solving an issue in software field is human and his intervention; its never a technology or hardware or software.
 
Piet Souris
Bartender
Posts: 4633
182
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Closing is indeed not the end. However, describing some techniques for doing statistics on a couple of documents is not such an easy task that you want to start when it is indicated that the topic is solved. But the judgement is of course up to OP..
 
Bartender
Posts: 1868
81
Android IntelliJ IDE MySQL Database Chrome Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Mooo!

Your posting was just mentioned in the June 2018 CodeRanch Journal and for that you get a cow.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic