• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading a complex file

 
amit bose
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have a file(*.txt) which is of the format:

<tag1>content1</tag1>
<tag2>content2</tag2>
...Etc till...
<tagN>contentN</tagN>
{1 : Data1}{2 : Data2..
...Etc till....
}

What would be a optimal way of reading the same?
Would a plain buffered stream read suffice OR a better alternative exist.

Cheers,
Amit
[ December 16, 2006: Message edited by: amit bose ]
 
Rahul Bhattacharjee
Ranch Hand
Posts: 2308
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Amit,

Does the file end with
{1 : Data1}{2 : Data2..
...Etc till....
}

Or , this is what you want..

Tag 1 =-> value : content1 like this.
You file format seems more like an xml..then why not use an xml parser..
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd probably create a lexer (using JFlex), because these little custom file formats have a tendency to become more complex over time, which makes a hand-coded parser harder and more error-prone to maintain.
 
amit bose
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My file input is not a XML file.Only that it has some header info present as XML Tags.
What I require to extract from this file is
content1,content2,........contentn (and)
Data1, Data2,......,Datan

So should I use Regex for the same, not sure about the Regex perfomance given that my file size would not exceed say 100 lines. However, the bulk of the input files would be quite enormous.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, Ulf's advice still seems pretty good. But regexes would work too. I think it's too early to worry about imaginary performace problems here - try it and see. Chances are good that the time it takes to read the file will be greater than the time necessary to parse it.

[amit]: ...given that my file size would not exceed say 100 lines. However, the bulk of the input files would be quite enormous.

That didn't really make sense to me. Are you saying that 100 lines is enormous? Are some of the lines extremely long? Are there many, many files? Or something else?
 
amit bose
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jim,

By that line I meant that the number of such files would be large.
About a certain thousand can be safely asssumed for now. The bulk is sure to go up in future.

Cheers,
Amit
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic