Jace Sim wrote:Anyhow, would be interested in any recommendations or idea's, and thank you in advance for your help.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Now the problem comes in when those log files, if they haven't been read before and are over a certain size (around 20Mb). Smaller files don't seem to have any issues, and I am a little confused as to what is going on, as when I look at the length of either the String or StringBuffer they are around the 13 million character mark. When I do a StringBuffer.toString(), it is as if nothing is passed, and from what information I can find, I am well below any limits of maximum characters for a String, (but maybe I am).
Winston Gutkowski wrote:
Well, the first thing that springs to mind is: why do you have 13 million characters in a StringBuffer?
Winston Gutkowski wrote:
The second is that you say on the one hand:
"The nature of the system is that only complete transactions are written to the log file"
and on another:
"I might possibly miss a transaction that sits between each of the grabs, as some of the transactions can be over a 100+ lines"
Winston Gutkowski wrote:
I understand that tail isn't the most sophisticated tool in the world (old Unix sysadmin), but either your transactions are getting written in one go or they aren't. Personally, I'd be looking at some form of log switching before I start to pull lines; but there may be other things that prevent you from doing that.
Winston Gutkowski wrote:
I'm also not sure of the nature of this program. Is it a daemon that simply sits at the far end of a pipe from a tail? Or is it something that you run periodically to process your logs?
Paul Mrozik wrote:
What do you mean it is as if nothing is passed? Perhaps you should try to send the result of the toString() to a file to see what happens. If it comes up empty then you'll know where to look for the problem.
Jace Sim wrote:Mainly I have some data in log files, that I need to be able to run multiple regular expressions over, and based on what I have and how the regular expressions match determines what I do with the data.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Jace Sim wrote:The data has a big mix of stuff, some stuff that resembles XML but not really, so have to treat it as non XML as well as other chunks of data. That is the main problem here, is that it is chunks of data, and also having to match thread ID's in order to track things through out the system, as it flows through business processes.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Jace Sim wrote:The ultimate aim at the end of the day, is to have a completely flexible monitoring system, that looks for things within the data, be completely portable between systems processing different types of data, and at the end of the day store whatever you want to store in a database.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote:
Well, it sounds to me like you have major problems:
1. You're trying to apply an existing (and, it would seem, brittle) program to data it was never designed for.
2. You have "some stuff that resembles XML but not really" that you're still trying to parse.
3. Not even sure where thread ID's come into it, or "track[ing] things through out the system".
Winston Gutkowski wrote:
It sounds to me like you need to StopCoding (←click) and get a handle on ALL these things.
What is that "XML but not really"? Are regexes really applicable to your new data? What is all this "thread ID" and "tracking things through the system" stuff.
You need to sit down with a pencil and paper (lots of it) and:
(a) Find out why you're being supplied with data that doesn't - apparently - conform to any known standard.
(b) Write a new spec.
Winston
Winston Gutkowski wrote:
Jace Sim wrote:The ultimate aim at the end of the day, is to have a completely flexible monitoring system, that looks for things within the data, be completely portable between systems processing different types of data, and at the end of the day store whatever you want to store in a database.
OK, but you're not thinking about this empirically. You're applying all sorts of constraints (of which, I suspect, regexes are your main problem) to your solution before you even know what kind of data you're going to be dealing with. ie, You're deciding HOW you're going to do this before you know WHAT you need to do - always a bad move.
Winston
Winston Gutkowski wrote:Regexes are good, but they're NOT a panacaea; and my general rule of thumb is that if they don't work in one line, then I need to find some other way of doing things. grep, awk and perl work well precisely because they were designed with a limited scope; but perl as an "object-oriented" language or grep as a multi-line matcher? Perlease.
Jace Sim wrote:I also chose the regular expressions so that later on I can be flexible in adding additional things to be extracted from the log files as I go along.
[...] The whole thing works perfectly, just got stumped by the String size. [...] but think the work around is going to be the best solution.
Believe me, would love to stop coding, but have to move forward [...] The trouble is, people's lives are actually at stake with this system, that is the scarey bit.. and I have to do whatever I can to get something in to make sure it is working until the new solution comes in which is over a year away.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Jayesh A Lalwani wrote:
Have you thought about using a OTS solution like Splunk to do your log analysis. What you are doing here seems a lot like reinventing the wheel
Jayesh A Lalwani wrote:
If I were to reinvent this wheel, I wouldn't put any kind of business logic in the log parser and I wouldn't use a database. The bit that reads the log files and parses them should
Jayesh A Lalwani wrote:
I can see couple of pitfalls with your current design:-
a) your logic seems very tightly coupled with how the application is logging information. That coupling is bad. Developers generally do not think twice before changing the log messages. The log file cannot be an "interface".
Jayesh A Lalwani wrote:
b) How the heck do you scale this? Really? One machine pulling logs from multiple hosts and grinding away pushing data into database? What happens when you add hosts? or some developer adds a log statement that doubles the size of the logs? or an Ops turns on FINE logging?
Winston Gutkowski wrote:
Whoa. "Lives at stake"? Sounds like the sort of emotive thing a manager might say to get me to burn extra candle hours to complete a job he knows is unreasonable.
Winston Gutkowski wrote:
Just a thought, and I have no idea whether it'll work or not: Rather than a single regex of almost unlimited scope, what about two?: a 'start' expression and and 'end' one? I'm presuming that your current one has ".*?" (or something like it) in it somewhere, so why not turn it into two expressions that only need to search at most a couple of lines at a time?
Winston Gutkowski wrote:
However, I fear there may also be some embedding or hierarchy involved in these "transactions"; and if that's the case, regex is NOT the solution (except possibly in a very limited way). You need a parser.
The happiness of your life depends upon the quality of your thoughts -Marcus Aurelius ... think about this tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
|