William Brogden wrote:I like that approach but I don't think there is any way to reason your way to an optimum solution, you are just going to have to measure stuff.
I think Ralph pointed out the problem of having more than one Thread reading, but this gets involved with operating system peculiarities.
Between file reading, conversion to UNICODE, and REGEX you have 3 expensive processes, only experimentation can guide you. Perhaps instrumenting your code with JAMON or similar can gather the information.
Please report back what you found, this is a fascinating problem.
Ralph Cook wrote:
We don't have to measure to know these things. We can make a good guess as to where the problems will be, and a way or two in which multi-threading will NOT help, and I think those should be used in the original design. Even if you're going to write some code to benchmark, it helps to have a good starting place, and avoiding the pathological cases is usually not that hard with some design aforethought.
steve souza wrote:I recently created a program that parsed web logs. With a little tuning I parsed about 10 MB a second. For my needs that was plenty fast enough. Can you come up with a number that would be acceptable? As I recall the reading of the file was the most expensive part of the operations as compared to Regular Expressions or any other cpu operations, so tuning IO had the biggest impact on performance. If this is true for you multi-threads may not help much. Of course as always you should time things to know where to tune.
It is not necessary to "come up with a number that would be acceptable", certainly not in advance of improving things.