• Post Reply Bookmark Topic Watch Topic
  • New Topic

find a keyword in a text file  RSS feed

 
Chris Kislow
Greenhorn
Posts: 21
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to figure out the best way to check for a word from 1 text file to a different text file

so if text1.txt has
hat
glove
sit
and text2.txt has
gum
running
glove
bad

I would want to have my code say hey a word from text1.txt equals a word in text2.txt

I was trying for loops with ArrayList
but having issues, any help
thanks
 
Carey Brown
Saloon Keeper
Posts: 3328
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Read all the words from file 1 into a HashSet. Read file 2, and if a word appears in the HashSet, then both files contain the same word.
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

You would not need to iterate a List to find whether it contains a wor‍d because Lists already have a method (actually in Collection) which does that. Carey B's Paul C's suggestion about using a set will work and probably give faster execution. You can also have two Sets (one for each text file) and mimic
S₁ ∩ S₂
with the retainAll method which will give you a Set with the words common to both files. (Copy the Set first, then you will have the original contents still available.)

[edit]Correction. Used ∪ by mistake. Shou‍ld read ∩
 
Chris Kislow
Greenhorn
Posts: 21
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Welcome to the Ranch

You would not need to iterate a List to find whether it contains a wor‍d because Lists already have a method (actually in Collection) which does that. Carey B's Paul C's suggestion about using a set will work and probably give faster execution. You can also have two Sets (one for each text file) and mimic
S₁ ∩ S₂
with the retainAll method which will give you a Set with the words common to both files. (Copy the Set first, then you will have the original contents still available.)

[edit]Correction. Used ∪ by mistake. Shou‍ld read ∩


I'm still a little confused, maybe I should clarify more, my 1 text file has lots of data and my 2 text file has keywords that I need to find in the 1st text file.
So are you saying something like Hashset<String> file1 = new HashSet<String> and HashSet<String> file2 = new HashSet<String>
The I can use this retainAll method (file2.retainAll(file1)).  Is this what you mean.
thanks for all the help
 
Carey Brown
Saloon Keeper
Posts: 3328
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Chris Kislow wrote:
Campbell Ritchie wrote:Welcome to the Ranch

You would not need to iterate a List to find whether it contains a wor‍d because Lists already have a method (actually in Collection) which does that. Carey B's Paul C's suggestion about using a set will work and probably give faster execution. You can also have two Sets (one for each text file) and mimic
S₁ ∩ S₂
with the retainAll method which will give you a Set with the words common to both files. (Copy the Set first, then you will have the original contents still available.)

[edit]Correction. Used ∪ by mistake. Shou‍ld read ∩


I'm still a little confused, maybe I should clarify more, my 1 text file has lots of data and my 2 text file has keywords that I need to find in the 1st text file.
So are you saying something like Hashset<String> file1 = new HashSet<String> and HashSet<String> file2 = new HashSet<String>
The I can use this retainAll method (file2.retainAll(file1)).  Is this what you mean.
thanks for all the help

Certainly your second file should be loaded into a HashSet as it is a list of words. Your first file contains "data" which implies less structure. you should scan through the first file in whatever way the data allows and when you find a "word" you can look to see if it's in the HashSet. No need for two Sets.
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:No need for two Sets.

For the approach you describe, no, there's no need for two Set objects. However, from an object-oriented programming perspective, Campbell's proposed solution is more OO as it keeps the intent in the forefront while hiding the iteration in the Set object under the retainAll() method. If you want to write code that allows you to "see" the gears as they turn, then your solution would be the way to go. If you don't care about the nitty-gritty details and just want to tell a couple of objects to do the heavy lifting and come back to you when they're done, then the two-Set solution is the right fit.
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In fact, the high-level code for Campbell's solution could be very clear and concise:

EDIT: Actually, I think the code above isn't right. This maybe:

 
Chris Kislow
Greenhorn
Posts: 21
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:In fact, the high-level code for Campbell's solution could be very clear and concise:

EDIT: Actually, I think the code above isn't right. This maybe:






here is some code I wrote but not getting the results, not sure how to put the code in the correct format for the groups viewing

it seems that thesCurrentLine output is the only thing being printed


                 
           
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What I was hoping you'd get from the example I gave is how separation of concerns and functional decomposition can greatly simplify your solution.  The 36 lines of code you have there are deeply nested and jam packed with different concerns, from reading files to managing list elements to searching through lists.  Your code is very opaque (difficult to read through and understand) and disorganized.  Break the problem down into smaller more manageable chunks.

When the indentation of your code forms an arrow, that's called the arrow code anti-pattern. It means your code probably has way too many nesting levels and is way more complicated than it needs to be. Arrow code is also often a source of bugs and logic errors.

These two lines of code in the example I gave:

imply that there is a wordsFrom() method that you can write that will take the name of a file and produce a set of words that were read from that file. Now, wouldn't it be a little easier if you could just focus on implementing that bit of functionality without having to worry about all the other stuff you have to do, even for just a little while? When you're done with it and you've tested it, you can move forward confidently and know that that's a part of your program that's done and you don't have think about it anymore. Now on to the next part...

Programming is a discipline and you need to approach it systematically and in an organized way. If you don't, you'll just be spending a lot of time running around, like that little fat mouse in the classic Disney animated feature, "Cinderella", picking up more grains of corn than he could actually carry, only to keep dropping them and having to start all over again.

 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maybe you thought the code I gave was pseudo-code or something. No, it was real Java code that you can actually use.  If you were trying to build a car, that would have been the engine that you could have just mounted. All you have to do really is hook that bad boy up to the three methods that are being called: wordsFrom(), reportKeywordsFound(), and reportNoKeywordsInData().

Here's more of the skeleton, so you can get a better picture of what you'd need to program:

That's it. All you have to do is fill in the code for those three methods.

Do you see how the program is broken down into little component pieces, each piece attacking one small part of the problem? This isn't even the best way to write the program yet but it's a lot better than 36 lines all jammed together in one confusing jumble, right?
 
Chris Kislow
Greenhorn
Posts: 21
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:Maybe you thought the code I gave was pseudo-code or something. No, it was real Java code that you can actually use.  If you were trying to build a car, that would have been the engine that you could have just mounted. All you have to do really is hook that bad boy up to the three methods that are being called: wordsFrom(), reportKeywordsFound(), and reportNoKeywordsInData().

Here's more of the skeleton, so you can get a better picture of what you'd need to program:

That's it. All you have to do is fill in the code for those three methods.

Do you see how the program is broken down into little component pieces, each piece attacking one small part of the problem? This isn't even the best way to write the program yet but it's a lot better than 36 lines all jammed together in one confusing jumble, right?


Yes thank you so very much, I'll try this
 
Knute Snortum
Sheriff
Posts: 4281
127
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This doesn't look correct: It should be "/home/mearts/keywords.txt" or on a Windows OS, "\\home\\mearts\\keywords.txt".
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
On my Mac, it's pretty tolerant of redundant "/"s, so "//" or any number of slashes like "////" is treated the same as "/" -- this is also true at the command line. I tried in zsh and got the same behavior. I'm guess it's the same behavior you'll see in most if not all *nix flavors.

I'm pretty sure that the "/" is acceptable as a directory separator even when running on a Windows system, at least for the File class.
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Many people recommend / in Windows® file path Strings rather than \\.
 
Carey Brown
Saloon Keeper
Posts: 3328
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Any disadvantages to using '/' ?
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not that I know of.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!