This week's book giveaway is in the Open Source forum.
We're giving away four copies of Programmers Guide to Apache Thrift and have Randy Abernethy on-line!
See this thread for details.
Win a copy of Programmers Guide to Apache Thrift this week in the Open Source forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
Bartenders:
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

Reading multi-line records  RSS feed

 
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I parse a lot of log files where 1 record is on multiple lines.  In perl all I did was change the EOL character (End Of Line), problem solved.  I've got a 40 line Java routine to do the same thing.  Seems to me this is a common enough issue that there should be a built in way to do it.

Heh, may as well ask y'all to solve my real problem.  My records consist of word=value pairs separated by commas, except the last entry doesn't have a trailing comma.  For example:

2 questions:

1)  is there a built in method to read a record and return a single line, aka

2)  I need to extract the foo, bar, etc values.  Is there a quick way to make a map or somesuch, so elsewhere in my code I can say "totalFoo += map.get("foo");"?

Not all lines have foo, bar, etc.  Some lines have other keys, such as user, timestamp, etc.  All they have in common is the key=value format.

I promise this isn't homework, just me being curious.  I currently use regexps to decode the key=value followed by large switch statement, seems like there should be a better way.
 
Marshal
Posts: 64471
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use a Scanner with ,\\s* as its delimiter. Count five tokens.
That's one way to do it.
Read all the lines and look for indexOf("berry")
That's another way. I don't suppose either is elegant, but they will match the format of that file.
Once you have parsed the String, you can create objects. You can use the Scanner methods beginning find...() to identify “foo,” etc., followed by finding the number following. You can then put them into an object of some sort, and put that object into a List.
There must be many libraries for reading .csv files which will do all the parsing you want.
 
Ranch Foreman
Posts: 3243
19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to identify exactly how to know when one record begins and another ends, and look for that.  In your example, is foo always the first field?  Is berry always the last field?  Are there always five fields in a record?  Is there always a blank line between records?  Are the fields always in the same order?  Will any of those answers change, ever?  Don't rely on a feature that might not always be there... or might occur in a different order.

I don't think the suggestion of \\s* will work very well... that means you could have a delimiter consisting of zero whitespace characters, which can be found anywhere.  You'll just break the input into a bunch of single characters.

I like the idea of using the blank line as a delimiter... I would use:

This way "record" contains all the lines for one record.

The delimiter might just be "\n\n", but adding the repeating pattern with  \\s* in between allows additional whitespace on the "blank" line, and allows multiple blank lines.  May be unnecessary.
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Blank lines separate records.  As I said, I have a routine that works, but it seems like this is something that would come up enough that it would be part of the standard Java libraries.

As for the key=value, I'm currently using


But again, this seems like it would come up often enough it would be baked into the jdk somewhere.

Told you it wasn't a homework assignment

 
Mike Simmons
Ranch Foreman
Posts: 3243
19
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, the Scanner is a standard library.  There's no standard for record separators, so it's not surprising you'd need to customize the part that identifies that.

I did miss the fact that you're really asking more about how to parse the fields within the record.  Ok.  Honestly, the method you're using is pretty much what I would do.  Well, returning a Map<String,Integer> rather than a raw HashMap.  But same basic idea.  

While it may seem like this is a common pattern that you'd expect in the standard library... I don't think it's quite that standard.  Someone else might use ';' instead of ',', or ':' instead of '=', or allow additional whitespace, or use whitespace as a delimiter, or need values to contain other special characters, or need to specify escape characters, or need a hierarchical data representation like xml, yaml, or json.  There are many options that might need to be customized.  There are probably third partly libraries for this... but I'd be inclined to roll my own, as you have.
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I actually started my post asking about grabbing multi-line records from a file, you didn't mis-read that.  It was after I got going I realized key=value was probably more common that I added that part.

The real issue is now that I'm somewhat decent at Java and look at my old code I realize "damn, that should have been a hash".  "damn, there's a jdk method for that".  "damn, that wheel was invented years ago".

Then again, not many Java folks parse binary bitstreams, nor evidently log files.

Fun afternoon task:  google for how to parse log files.  You'll get lots of hits on how to read log files, how to use the log file classes, what to log in your log files, what not to log in your log files.  Notsomuch how to parse your log files.

I'm guessing we can call this thread closed, there evidently are no jdk built-ins to do these 2 tasks and I've got working code solving my problems.
 
Saloon Keeper
Posts: 3250
128
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmm. just wrote a complete routine to get a List<Map<String, Integer>>, and now the topic gets closed!
Well, here's the code that I wrote that produces a Map from inputstrings "foo=123, bar=2, hop=etotdemachtimaalpi, et cetera".

beware: had no time to test it!
And for the processing of all those multi inputlines I had a dedicated Collector written, but oke.
 
Saloon Keeper
Posts: 2566
323
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If the record delimiter is a blank line, you could use Scanner in a Spliterator and do something like this:

Output
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

now the topic gets closed!


Have no fear!  I still think there are better ways to do what I need to do and I will look at your solutions.  Just probably not for a couple days, I've got stuff going on Tuesday and Wednesday.

I think it's a fun exersize.  I do my best and ask people better than I how they would solve it.  No better way to learn.
 
Bartender
Posts: 2277
95
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:Hmm. just wrote a complete routine ...


Surely you don't want to compile the same pattern over and over again for each line for the file.
 
salvin francis
Bartender
Posts: 2277
95
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's my idea:
 
Piet Souris
Saloon Keeper
Posts: 3250
128
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can also use:

(for brevity reasons I named your class RLC, and added the method 'combine(RLC other)  {}').

RLC does not need to implement Consumer.
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.  I've just spent a couple hours with setDelimiter() and various patterns with no joy.  I'm now working with Ron's code but can't get it to compile.  Ron's code below for clarity, I embedded it into Record.java for testing.

Big problem here is I have no clue what he's doing, nor what he wants to do  I've never looked into streams before.

Someone else, I think it was Piet, gave a stream version to make a map out of resource=value tokens, haven't looked at that yet.  I'm still trying to find a better way to read my multi-line records.


This is what you call a learning experience.  I've got working code for reading multiline-records (code I don't like), and code to make my map (which I'm happy with, but plan to look at Piet's anyway).  So instead of solving my original problem what do I do?  Try to make my working routines better.

That my friends is what you call being retired and not having to answer to a boss
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wow.  Thanks Piet for RLC.  I don't even know where to start figuring that out.  What do I need to import to get it to compile?
 
salvin francis
Bartender
Posts: 2277
95
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think by RLC he meant my "RecordLineConsumer" class in the previous post.
 
salvin francis
Bartender
Posts: 2277
95
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jim Venolia wrote:......


java.util.Spliterator is a part of Java 1.8, So you'll need the latest version of java to compile that code.
A quick tip: the javadocs has a "Since:" header in the page. That mentions the version of java when class was added.
 
Ron McLeod
Saloon Keeper
Posts: 2566
323
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's the code with the required imports:

As Salvin mentioned - you will need Java 1.8 or newer to use Streams.
 
Piet Souris
Saloon Keeper
Posts: 3250
128
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@Jim
don't worry, I too must study what Ron is doing! Never used such things before.

And we have been throwing quite some violence at you. So have a cow for all that you are going through!
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To be honest I'm loving the hell out of this thread.  Started out with me asking the question of "my naive java newbie self figured out how to do these 2 things that IMHO sound like common issues", and has evolved to that RLC thing Ron posted.  As it stands Salvin's Consumer thingie is working for me, still not sure why I can't make it work without the "implements Consumer" part, nor where accept is coming from.   Don't explain, I'm making myself sound dumber than I am.

Streams.  30 years ago I was doing magic with Unix pipes.  Streams are how those pipes have evolved.  I want to learn them.

I just wish I was 20 years younger so this new knowledge would be more useful.   Actually, scratch that.  I wish I had another 40 years to live, I'd spend lots of them putting this knowledge to use.
 
Jim Venolia
Ranch Hand
Posts: 548
1
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

java.util.Spliterator is a part of Java 1.8



This is a large part of my problem.  I get these snippets off the net, none of which give the imports, so I'm suddenly playing a guessing game.  I have this strong hatred of things like "import java.utils.*", probably from my C days where every #include not only added to your compile time, but when an include file you didn't use changed you ended up compiling your program anyway.

Whatever, I'm running the latest and greatest Java AFAIK.  Hmmm.

I really don't want to hijack this thread on importing everything or not, I'd rather know how to quickly find out what to feed "import" when I find something unknown like "FooStuff" in code found on the net.
 
Marshal
Posts: 24458
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jim Venolia wrote:I'd rather know how to quickly find out what to feed "import" when I find something unknown like "FooStuff" in code found on the net.



I'm using Eclipse so when I see a class name with a red line over it, I put my mouse over it and Eclipse gives me a list of ways to make the red line go away. Usually one of the options is "Add import java.whatever.FooStuff".
 
salvin francis
Bartender
Posts: 2277
95
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jim Venolia wrote:...This is a large part of my problem.  I get these snippets off the net, none of which give the imports, so I'm suddenly playing a guessing game.


There are two solutions for this:
  • Online generated javadocs. You can access it Here for Java 8 or Here for java 9
  • You can generate the javadocs and keep it offline on your machine

  • I am not 100% sure about the second option. Last I remember, in the java installation path, you can find a zip file called "src.zip". You need to unzip it to extract all source code. Next there is a javadoc command that should be executed. It will generate all the javadocs for src offline as HTML pages. I do not recall the exact command line parameters for "javadoc" for complete src javadoc generation.

     
    Jim Venolia
    Ranch Hand
    Posts: 548
    1
    Chrome Linux VI Editor
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks Ron, I've never seen the :: operator before.  It's going to be fun figuring out how this code works, it's way beyond anything I've ever done.

    As for Spliterator, looks like my problem was I didn't realize there was both a Spliterator and a Spliterators.  I tried one or the other and things wouldn't compile.
     
    salvin francis
    Bartender
    Posts: 2277
    95
    Eclipse IDE Google Web Toolkit Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Jim Venolia wrote:Thanks Ron, I've never seen the :: operator before.  ...


    The operator is used for method references. I have written a post in the past that covers all the use cases of this operator: https://coderanch.com/t/686059/paradigms/Wanna-learn-Method-References
    Although my example was highly criticized in that post, nevertheless it covers all use-cases from semantic standpoint only.
     
    Marshal
    Posts: 5980
    155
    Chrome Eclipse IDE Java Postgres Database Ubuntu VI Editor
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Paul Clapham wrote:

    Jim Venolia wrote:I'd rather know how to quickly find out what to feed "import" when I find something unknown like "FooStuff" in code found on the net.



    I'm using Eclipse so when I see a class name with a red line over it, I put my mouse over it and Eclipse gives me a list of ways to make the red line go away. Usually one of the options is "Add import java.whatever.FooStuff".


    And Ctrl-Shift-O will try to add all missing imports and prompt you if there is any question as to which one it should use.
     
    Master Rancher
    Posts: 4072
    47
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Jim Venolia wrote:
    Then again, not many Java folks parse binary bitstreams, nor evidently log files.



    There are several tools that parse log files, but it helps if the log file has some form of standard format (eg one of the common log4j formats).
    Your current structure looks nothing like a log file to me.
    They usually have time stamps, and class names, and line numbers, and then you'd get the sort of stuff you have.

    This then becomes not so much a case of parsing a log file, but parsing part of a log file entry...and that's as wide and varied as parsing any other file.
     
    Jim Venolia
    Ranch Hand
    Posts: 548
    1
    Chrome Linux VI Editor
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    There are other types of log files than tracking software flow.  Some I've worked on dealt with cell phones, missile telemetry, and robotic controls.  None of them had file names, class names, nor line numbers.

    They do all have timestamps though.
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!