I need to process huge text files line by line. Using Java 8 streams makes for a speedy and memory conserving solution:
Now I would like to improve speed even further. My search/replace code consists of a number of classes working on a line at a time. The order of the search/replace classes is significant and I also need to keep the order of the lines in the resulting file. This stops me from making the stream parallell. :-(
Would it make sense to use a ForkJoin solution with some kind of queue? Loading x thousands of lines into a queue. Then wrapping the replace class list in a ForkJoinTask working the queue. When work is completed append the modified queue to file.
Given the requirement for keeping the order of lines in a file and the order of search/replace operations in a line, the first thing that comes to mind is separate threads for reading, processing and writing lines to the new file.
This could be as simple as using java.io.PipedReader or you might maintain a queue of input lines. See also the java.util.concurrent classes.
The idea of course is that IO operations get handed over to the operating system while your search/replace methods can proceed.
If you want to look young and thin, hang around old, fat people. Or this tiny ad: