I have two csv text files, one with about 17,000 rows and the other with about 3,000,000 rows. The file with 3,000,000 has a field that I need to attach to each of the 17,000 rows if they match on three values.
My current setup is:
Read the 17,000 rows into an ArrayList using the Ostermiller CSVParser.
Read the 3,000,000 rows in using the Ostermiller CSVParser.
For each of the 3,000,000 rows iterate through the ArrayList and look for a match.
If a match is found write it out to a text file and exit the loop.
This solution works and I am able to process roughly 200 to 210 rows per second using the full file of 17,000 and a sample of about 75,000 from the large file.
My question is, is this a good solution or is there a better more efficient way of doing this? If my calculations are correct, it will take close to 3 hours to process the full file. And I may have to process the file several times.