Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

java.util.Scanner class - splitting a large text file  RSS feed

 
Jay Chapalamadugu
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I have a cituation here in my project, where i am wring a parser for the text file of size ranging between 100 - 700MB. All the records inside are seperated with Hex characters ; start of the record - "\u0002" ; end of the record - "\u0003". In one of the postings i found Scanner is very useful for not to load the whole data into memory, meaning i can read the file record by record. The file will have 1000's of records each record seperated by the Hex characters mentioned above.

Could any one please help me with a piece of code to read the input file record by record ( i already wrote a parser for parsing individual records, which are considerably small in size - can afford to store in memory) without loading whole data into memory?

Quick reply is much appreciated.

regards,
Jay.
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the JavaRanch.
We like to help people, but we ask that you show some effort.
Have a look at the Scanner Java Doc and the Java Tutorial and give it a try. If you have any problems, feel free to come back with some code and we'll see what we can do.
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would recommend using the findWithinHorizon() method, with a horizon of zero and the regex If you don't want the delimiters returned as part of the record, you can use lookbehind and lookahead to match them:
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!