• Post Reply Bookmark Topic Watch Topic
  • New Topic

A 'push' parsing model  RSS feed

 
David Weitzman
Ranch Hand
Posts: 1365
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For people who are somewhat familiar with parsing:
The parser generators I'm familiar with (JavaCC, ANTLR) work on a 'pull' model -- you call parse(InputStream) once and the input has been completely parsed by the time the call returns.
I'd like to parse some potentially complex stuff, but I won't have the full input all at once (I'm using nonblocking IO). It would be really cool if I could somehow get the handy features of a parser generator like ANTLR but with a different pattern of usage.
You would pass a callback object (sort of like a SAX handler) to the parser and tell it what events you're interested in (i.e. parse a sequence of statements and let me know after each complete statement). The callback object would look sort of like this:

Then call Parser.appendInput(byte[] data) or Parser.appendEOF() as bytes arrive.
Does anyone know if such a parser generator (written in Java) exists? I see at least an instance of something like it at hunnysoft.com, but that's an isolated case (like SAX) in C for parsing MIME.
I think gold may make such operation possible, but if so it would require some modifications and weirdness. Plus GOLD seems more academic than practical.
Anyway, does anyone have any comments on the matter?
 
Maulin Vasavada
Ranch Hand
Posts: 1873
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi David,
just out of my mind..i've never used any such Java Parsers or anything like that...
can u use PipedInputStream() thing if u really dont need to have asynchronous model using the Push instead of pull?
and i guess u can create a thread that does this handling of PipedInputStream() and u can continue to execute rest of ur code...
just my 2 cents...
regards,
maulin
 
David Weitzman
Ranch Hand
Posts: 1365
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My goal is actually to avoid creating threads. The server-side approach where new threads are created for each process doesn't scale under high concurrancy use.
I think some of the Java parsers are basically Finite State Machines, which should allow them to accept input at any rate -- the interfaces just don't make that an option.
One hack solution is to look for complete statements (recognizing end of line or semi-colon, counting parenthases until they've all been matched, etc.) and then parse the small units on their own. That approach doesn't support formats that aren't so clearly structured though.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!