• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

Compare two huge text files in java and get non matching records

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to compare two huge text files in Java and print non matching records but I am not getting the expected results because all the records are getting printed even though both files are same. Please assist.


 

 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If this was my problem I would be printing both "a" and "b" - you would quickly discover why it doesn't work.
 
Jayakumar Mohanavelu
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am able to fix this issue by applying below change. Please let me know if there are any performance efficient approach for comparing two huge file and printing non matching records.

 
Marshal
Posts: 70211
280
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jayakumar Mohanavelu wrote:I am able to fix this issue by applying below change. . . .

Please explain what you think your solution does. Does it reliably find differences? Does it reliably reject two lines the same?
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, the file comparison problem can get very tricky and has consumed a lot of programmer time over the years.

Consider:

Suppose the first line is missing from file b - a typical comparison program will show differences for all records - is that what is wanted?
 
Campbell Ritchie
Marshal
Posts: 70211
280
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... and what will happen if a line has an additional word in one version.
 
Jayakumar Mohanavelu
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I will explain in detail by giving an example

File1:

firstrowvalue1 | firstrowValue2 | firstrowvalue3
Secrowvalue4 | SecrowValue5 | Secrowvalue7
Thirdrowvalue8 | ThirdrowValue9 | Thirdrowvalue10

File2:

firstrowvalue1 | firstrowValue2 | firstrowvalue3
Secrowvalue4 | SecrowValue5 | Secrowvalue6


Expected Output:
Secrowvalue4 | SecrowValue5 |Secrowvalue7
Thirdrowvalue8 | ThirdrowValue9 |Thirdrowvalue10


Note: I highlighted the changes between two files.



 
lowercase baba
Posts: 12893
63
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what should this return:

file 1:
a
b
c
d

file 2:
b
c
d

 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Note that by using contains, line a could be longer than b by any arbitrary amount or could contain leading characters without returning false.

If you want equality, use uquals.
 
Saloon Keeper
Posts: 6588
160
Android Mac OS X Firefox Browser VI Editor Tomcat Server Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If this was my problem, I'd start by using one of the existing diff utilities -which you can invoke from within Java-, and then work on its output.

If the values are numerical you should also look into ndiff, which is specialized for that.
 
Don't get me started about those stupid light bulbs.
    Bookmark Topic Watch Topic
  • New Topic