• Post Reply Bookmark Topic Watch Topic
  • New Topic

Comparing XML files of sizes greater than 2gb  RSS feed

 
Vib Mator
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have two XML files of sizes greater than 2gb each and of same stucture.

I need to compare the files and print the same data between the two.

How do I read both of the files in memory to compare?
 
Rob Spoor
Sheriff
Posts: 21094
85
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I take it the formatting should be ignored? If not then you can do a simple byte-by-byte (or perhaps more efficiently byte[]-by-byte[]) comparison.

Otherwise you will need to have some form of stream-based reading, like SAX. You will need to get one token of the first file, then a token of the second file, compare these, then get the next tokens. That's not trivial, making one SAX parser's next token wait on the first one. After all, SAX parsing is event based.

One option would be to use a blocking queue of sorts. The parsing will be threaded, one thread per parser. The parsing will put the token into its queue, blocking if there is already a token inside. Once both queues have a token you can compare these two, take these tokens out of the queues, and the parsers will be able to put a new token into the queues.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!