• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Knute Snortum
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Ganesh Patekar
  • Stephan van Hulst
  • Pete Letkeman
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Ron McLeod
  • Vijitha Kumara

Parsing a log files, wondering where are the dividing lines between objects.  RSS feed

 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to stream a bunch of records from a log that I am parsing. One record class(that represents one log record, but might span multiple lines). I am used to using LINQ in .NET, but I don't know where the proper boundary is for objects in Java. .NET tends to favor Generics and function composition, whereas Java favors object composition.

I know this can be done several ways in Java, but I wanted advice on the "more appropriate" way to do this. Streams is similar enough to LINQ for me to want to use it, but annoyingly different enough for me not to know if I should.

Here is the naive function in C#.

For Java, even if I could make this work with Streams, would be horrible for Java folks to look at. Which is the appropriate level of encapsulation that a Java person would expect?

Create an overload of the Stream class? Then I am just a simple parsing conduit and let the caller worry about source?


Or would this be a repository, that might pull from multiple bits of data under the hood?


Create a parser class, that returns a stream of records from a single file for each call?


Or is it better to set it up so that each Parser is instantiated for one file?
 
Marshal
Posts: 60130
188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Bill Crim wrote:. . . Create an overload of the Stream class? . . . . . .

People avoided suicide because the Calendar class was introduced, which made people too apathetic for suicide. I think it is Cay Horstmann who points out that they managed to get dates wrong twice.
Your question is too difficult for this forum, so I shall move you.
Don't extend Stream which is not a class but an interface. You don't usually create your own Stream objects, but leave that to methods returning a Stream. But I think the problem goes further back than that. It is how you are logging things into that file. There are several overloaded versions of Logger#log(). If you simply log things into the file, you would have something like the level of severity followed by a message, and the file would be text (I think), encoded ISO8859‑1 on a Windows® box or UTF‑8 on a “*nix” box. Probably the easiest way to get a Stream would be like this:-More about the methods:- Files#newBufferedReader() Paths#get() BufferedReader#lines()
Unfortunately I don't know how many lines you have per record, so I wouldn't know how to identify your delimiter; maybe somebody else will know. You could concatenate all lines until you encounter a level again (see Level#toString()) but I can see potential errors if any message has “SEVERE” or similar in. Can't you organise the logging so each object has a line to itself?
 
Bill Crim
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

About Streams, is the Streams class a closer comparison to the IEnumerable + Enumerable.Extention methods? I was confused because what is an interface vs class isn't always clear in Java. (I use IntelliJ IDEA, and it helps some).

I am parsing, what I assume are Apache http access log files. The general idea is to display a nice table/graph/chart for some dimension of data.  i.e. "I want a chart of all the top-level pages(as opposed to images or widgets), coming from google, where the user is mobile." The goal of this is 40% for me to learn Java, and 60% to get basic stats from the log file. I have done this sort of operation before in .NET. I can confirm that parsing log files from disk is VASTLY faster than any sort of database access. Assuming this is a "admin looking at graphs a few times per day" operation, and not a "I have 25,000 users who want this page every hour".  Modern CPU and Disk caching of sequential access is a wonder to behold.

I know the ACTUAL solution to this problem is just to send it to Logstash/Kibana, and perform all my what-if scenarios there. If this project takes more than 1-2 days, that will be my solution. But I still want to be sure I am writing "Idiomatic Java" and not "Bastardized Java-flavored .NET". I consider it a grievous error that logging frameworks don't make default parsers/readers for their logging. I am not in "charge" of the domains I am monitoring, I am just trying to assist with some insights into the traffic-over-time items. All data is presented as-is.

Perhaps something like a class that gathered all the files into one, ordered repository. Then create a reader/parser that perhaps took the repo? The repo would put the files into the proper order. so like an ApacheLogRepo, which would take a list of files(or a directory) and setup all the files in the proper order to be read sequentially. Then the Repo would be a source of data for a Parser that would take in the repo, and stitch together the log entries into a continuous sequence. In Java would the repo be the source of the data files, or the source of the LogRecords?

The feeling of programming in Java after 18 years of programming in .NET...

  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!