• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Compare Two Text Files

 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Friends,

In the earlier posting i got help as to how to process a logic and write output to text files, Thanks a lot the help. Question i have in mind is i need to now iterate through these two text files and get what common in both these files. Is there an API i can use? or do you suggest me to call Unix compare command via JAVA code?
I have come up an Algorithm too, please suggest would it work?


Comments are appreciated. Thanks a lot.

-Aditya
 
Siddhesh Deodhar
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Few corrections

FileInputStream fstream2 = new FileInputStream("textfile1.txt"); -> You are reading same file

if(strLine1 = strLine2) -> You should use .equals() method to compare two object values.

I don't know of any direct aPI whic can be used to compare files directly in java. If you want to find common lines..your above code is fine.

Using Unix compare command via JAVA code is all time best option

 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Siddesh,

I am in the process of implementing it. I wll keep the thread updated.

-Aditya
 
Hardik Trivedi
Ranch Hand
Posts: 252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
Dear there is very big mistake either side of you or repliers....
I think you want to find such list of words which are common in both files...RIGHT?

Then let me tell you there is no specific method or api for that.
Use your own algorithm.
fetch word and compare that with all other words in second file if it match anywhere put in the array of string
and finally return that array.....
 
Hardik Trivedi
Ranch Hand
Posts: 252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi i found a very good prog for you...please refer the link
http://www.sourcecodesworld.com/source/show.asp?ScriptID=836
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hardik Trivedi wrote:http://www.sourcecodesworld.com/source/show.asp?ScriptID=836

Taking a quick look at this, it seems rather unsophisticated. For example, it makes no allowance for missing or extra lines in one of the files. So if the first line of one file is missing, then *all* subsequent lines will be reported as different, even though they may be identical.

This is generally the realm of the "diff" command, which is available on all Unix/Linux boxes (as opposed to "compare", which is not).
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are multiple Java implementations of diff, if that's what you actually need. A good diff algorithm goes *way* beyond the code you've posted.
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Everyone,

I wanted to post an update with regard to my question. So the earlier code that goes the way as below does not work perfectly in all the cases. What it does if value in any lines are same it displays that string, which i not the actual output.
For example if the content of the two text files are as below, the actual output should be
"
A
friendly
place
for
Java
greenhorns"
,
where as the out put i get is "place". What changes should i made to the existing code? Comments are appreciated.






 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why should the output be what you show above? What are you trying to accomplish?

In any case: you're printing the line if the two lines, one from each file, are the same. I'm not even sure why you're getting the line that says "place", since text2.txt has a lot of leading spaces.
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello David,

I want to print the lines which are common in both the text files.

-Aditya
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then if the formatting above is correct, *none* of the lines should print, since they all differ in spacing.
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am sorry, that is true, it does not print any thing, where it should print the lines common in both the text files.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But it is--there *aren't* any common lines. Unless you're trying to say that you want to ignore whitespace.

You really need to be specific about your requirements, otherwise we're all just guessing at what you want, and that's not an efficient use of time.
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi David,

Sorry for the inconvenience, i want to consider the white space too and grep all the lines that are present in both the file irrespective of what line number they appear.

-Aditya
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're not grepping, you're diffing.

So really what you need to do is to keep track of each line in each file and compare them once you've read them in, right?

(That's a hint on how to proceed.)
 
fred rosenberger
lowercase baba
Bartender
Posts: 12186
34
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aditya Sirohi wrote:i want to consider the white space too and grep all the lines that are present in both the file irrespective of what line number they appear.

This is a perfect example of why specs are so important. We have gone from

"get what is common to both files"

to an example (which by itself is fine, but is incomplete)

to "print the lines which are common in both the text files"

to " it should print the lines common in both the text files"

to " i want to consider the white space too and grep all the lines that are present in both the file irrespective of what line number they appear."

All these statement could mean slightly different things to different people. What does "consider the white space too" mean exactly? if file 'a' has "fred " and file 'b' has "fred", is that a match or not?

If I am interpreting what you want correctly, and I am not sure I am, I think what you need to do is read a single line from file 'a', and see if it's in file 'b', using whatever restriction you need regarding white space.

The, read the next line of file 'a' and compare against every line again.

You can possibly make your program smarter by checking to make sure that you read from the shorter file, that you don't re-test a line if you've already looked for it (unless you need to know for some reason), and perhaps by using the right data structures to store some info.

But the first thing I would do is nail down EXACTLY what you want in unambiguous terms.

 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Fred,

I want to apologize for not being clear with my question.

All these statement could mean slightly different things to different people. What does "consider the white space too" mean exactly? if file 'a' has "fred " and file 'b' has "fred", is that a match or not?


Yes, if file 'a' has word fred and file 'b' has word fred then its a match.

I tried to write a piece of code, but it did not work. Comments are appreciated.

 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As both Fred and I hinted, if order is not important, then simply looping over the lines isn't going to work--you need to be able to check all previous lines of the first file for each line in the second file. Can you think of some ways you might approach that?
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I know what the code should be like, but i am finding it harder to implement it.

The pseudo code i have in mind is:

1. Read file 'a' line by line.
2. for each line in file 'a', check whether is present in file 'b', if its there then print the line.


I think that should solve my main problem. If i could get to know what Constructor and method i can use or a skeleton solution to the problem, i can work from there on.

Thanks
Aditya
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Constructor for what?

In any case, I don't need to give you a skeleton--you just defined the skeleton by writing out the steps you need to take. So what's next? What's the easiest way you can think of to implement what you just described?
 
salvin francis
Bartender
Posts: 1307
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aditya Sirohi wrote:
2. for each line in file 'a', check whether is present in file 'b', if its there then print the line.


Let me quote that in a better way,

for each line in file 'a', iterate through ALL the lines in file 'b' and check its existence there

if you want a simple optimization,
load all lines of file 'a' and 'b' in two array list A and B

for each element in A, check its existence in B using contains().

CAUTION: The above optimization is not suitable if the file size is great.
 
salvin francis
Bartender
Posts: 1307
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
my approach would be to do a simple hashing of every line in A and B and store them in an arraylist as strings
then use the contains method to check existence.

however then the complexity of a hit and a miss comes into picture and thus optimizations (as usual) complicate a simple issue...
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello All,

I had been working whole day today and i made some progress, i can now store all the line of file 'a' into an array. Now i am trying to iterate over each element in the array and check if its present in file 'b'. I wanted to share the code i have till now. My code will look like novice, expert comments are appreciated.

Thanks
Aditya











 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
I have stored the content of the two files into an array but when i try to compare them, i get a null pointer exception on line :- if(arrayLines1[i].contains(arrayLines2[j]))

Code that i have till now:-












 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You have the *capability* of reading in a thousand lines, but the files don't necessarily *contain* a thousand lines. So you don't want to check the length of the array--you want to check against how many lines the file actually has.
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I cannot get the common strings in two array i have created in the above code. I have tried to do this till now. But i dont get any output. I get an IO exception. Am i doing any thing wrong?

 
salvin francis
Bartender
Posts: 1307
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Newton pointed out a very grave problem in your solution,

1. you do not know the #lines in the file
2. you have hard coded it to 1000

what if a file contains 10 lines only?

it best at these situations to use a collection since they have the ability to expand themselves as new elements are added to them,
eg an ArrayList.


Secondly in your code:
i <= arrayLines1.length

Should have been:
i < arrayLines1.length

I dont see any reason why those lines of code should throw an IOException,
perphaps you could paste the first 10 lines of the stack trace ?

 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I told you exactly what the problem was.
 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks To everyone, Java Ranch is a awesome place to learn. I would say i am a novice in programming, but when i get some feedback i get motivated to solve the problem. So i am posting the code below which give all the lines common in the two files. I still get the exception for line 13 and 63. Comments are appreciated.












 
Aditya Sirohi
Ranch Hand
Posts: 93
Eclipse IDE Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I got it i had to do for (int i = 0 ; i < arrayLines1.length ; i++) instead of for (int i = 0 ; i <= arrayLines1.length ; i++) in the displayRecords().

Thanks everyone. Marking the string as resolved.

-Aditya
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's no need to read each file twice, but it's a good starting point. Congrats!
 
salvin francis
Bartender
Posts: 1307
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Glad to be of assistance
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic