Win a copy of Practical SVG this week in the HTML/CSS/JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

File parsing woes

 
Abhik Sarkar
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi everyone,

I am having difficulty parsing a file which has variable syntax... any help would be apprciated.

Each "record" in the file has four columns. The first three are ascii, and the fourth is binary. The record separator is a newline character. The column separator is a comma.

As of now, I am using the newline character to separate records and it is largely successful. However, once in a while a newline creeps into the binary column, creating a bit of chaos. The position of the newline in the binary column is of course unpredictable. Column widths are also not fixed.

Any ideas how I could parse the file? Regexp, etc?

Many thanks,
Abhik.
[ June 27, 2004: Message edited by: Abhik Sarkar ]
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24213
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmmm. Well, this is a thorny problem. If the binary data is of no fixed size, then of course it's an impossible problem, as in theory, the second and subsequent records could be interpreted as being part of the binary column in the first record -- all the record separatorsin the file could be part of the first binary column entry, of course.

If there's an upper limit to the size of that fourth column, and the first three columns have some fixed format (i.e., they're always numbers, for example) then your parser could, on encountering a newline in the fourth column, check to see if the next N characters were a number followed by whitespace; if so, then the newline is a record separator, and otherwise, it was part of the fourth column.

In any event, that would be fairly tricky code. If you can possibly have the format of this file changed, do so! Otherwise, make sure you're using all the information in the file -- i.e., perhaps the length of each binary field is encoded somehow.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!