Other log types span multiple lines and my problem is that i cant figure out how to tell if a certain log type has ended or not. Below i have added an example for you of the multiple line spanning types. The added problem is that they don't always span the same amount of rows and don't always have the same attributes included. Currently i am reading the file line by line, splitting it into an array of strings with a whitespace delimiter and then checking the array for keywords such as ciaddr and then storing the value of ciaddr as that element in the list+2 so that i can the ip address after the equals sign. This is fine for checking just one entry but i have no way of telling when the log type has ended and another has begun. Could someone help me out with a simple solution as i think i have over thought the problem and now am drawing a blank
If that is the case then you could trim each line to remove leading and trailing whitespace and then check each line for 'n' white spaces (or possibly a tab char). Not sure how reliable this will be though as the details part may contain similar white space.
Are there any other markers than can be used ie is it only certain types of log that can be multiline?
Then within a multi-line log the dhcp,debug,packet is followed by a single space for the first line and 5 spaces for the other lines (or are they tabs ?).
Can you use this information to identify multi-line log entries ?
Basically you need to identify some pattern that distinguishes single line entries from multiple line entries and then another pattern to distinguish the first line of a multiple line entry.
tom davies wrote:Could someone help me out with a simple solution as i think i have over thought the problem and now am drawing a blank
Well, as Joanne says, you need to find some pattern that distinguishes a first line from any other; and just looking at your sample, it would appear that all first lines start with:
"MMM/dd/yyyy HH:mm:ss " (note the trailing space)
(in SimpleDateFormat terms)
However, depending on content, you might be unlucky enough to run into a line that just happens to start with a date in the same format.
One thing that a lot of text parsers (eg, shell script interpreters) do is to have a line continuation marker. In bash, for example, it's backslash; so any line ending with a "\" is assumed to be continued on the next line.
I have just added a check that says if(line.contains(" ");
Is that a good way to do it or should i find an alternative? I just put a prntln in to check and it seems to work, i will try and get it to recognise multiple log entries now
The next step i am trying to add this dhcp parser into a complete log parser that will include methods to parse the other log types.
I have encountered two problems. 1 is that my date/time are wrong, i think i am recording the date from the next log entry and not the current one.
The main problem is that i cant get my head around how to find the end of a particular log. Currently i am working that out in the dhcp method which is fine if the only logs are dhcp but if the dhcp method is not called, at the end of a file for example, then i do not get that entry stored. I have a feeling the date/time problem will be sorted when i figure out how to find the end of an entry.
If i had a method to go one step forward, check the next entry for log type and/or dhcp type then go back again and store the previous entry if the next was a different log it may work. There is no way to do this with scanner though. Any advice is much appreciated.
My current parser (sorry if its a bit long, but it shows how my parser currently works):
tom davies wrote:If i had a method to go one step forward, check the next entry for log type and/or dhcp type then go back again and store the previous entry if the next was a different log it may work.
Have two variables to hold the current and next entries. After you process each entry you discard the current one, make the current entry variable point to what was the next entry and then read the next entry and point the next entry variable at it.