I have a lookup file(around 3 GB) as below and I would like to load that file to java memory. I would like to know the possbility of this. Could you please any one guide me?
But before you do that you should revisit your design to make sure you really need to have all that data in memory at once - is it possible to just load part of the data at a time. If so it means you will be less restricted on what machines your program will run on.
Once you've done that you need to read up on the collections I mentioned to decide which is most suitable for your needs.
RAM: 3 GB
I have around 20000(around 1 or 2 million records in each file) files with coulmn A and another file with columns A and B (this is like lookup file). So now i have to iterate through all those 20000 files and column A has to be replaced with column B(from the look file). This is the requirement.
I am looking for the option that does not requires to load data to database.
Thanks in advance
Major memory handling and performance issues were fixed in 1.5+. I would highly suggest an upgrade.
If you have 3Gb of RAM, how on earth do you expect to load a 3Gb file without swapping/paging the system to death.
Is this running on Windoze, 32 or 64bit?
As Adrian said, maybe you need to rethink your needs. What about a random access file? Only load what you actually need.
Balasubramaniam Muthusamy wrote:Yes. I need to keep only that look up file which is around 3GB. is it possible?
A 3Gb lookup file? Even assuming it's text (which it probably shouldn't be), I would reckon a 20,000 line lookup file would fit into a few meg.
Methinks your problems start a lot further back than this.
Why on earth would anyone keep 20,000 files around to support a system? Especially ones of that size?
The only possible reason I can think of is that it's independently distributed and that this is some sort of 'batched' update involving temporary files, or some "database" made up of a bunch of redundant copies of "data"; in which case why not just bite the bullet and implement a proper one?
If it's a one-off, you could read in part of the lookup file, process the 20,000 data files, then read the next chunk. You'd need logic to handle interruptions, but I think those would be solvable. so write it, and let it go for as long as it takes.
alternatively, you could process one of the 20k files in its entirety. you'd write the changes to a .tmp file. Then, when you complete the lookup file, you rename the .tmp to the original name. by looking at timestamps, you could figure out which had been done and which hadn't. If the job is killed, the .tmp file can be discarded, and you would restart on the untouched source file...
Sure, this will take a while, but what do you expect when you have 20,000,000,000 records to process against a 3GB file...
This is just one shot fix and not going to run anymore. My lookup file has around 125 million records. is there any way we can split into chunks fo data and process them? I am also looking for is there any kind of index option or RandomAccessFile?
Thanks much again
Balasubramaniam Muthusamy wrote:This is just one shot fix and not going to run anymore.
Hmmm. Hate to say, but I've heard that before.
My lookup file has around 125 million records. is there any way we can split into chunks fo data and process them? I am also looking for is there any kind of index option or RandomAccessFile?
Sure, there are plenty of splitter utilities out there; or you could simply write one yourself (possibly better if the "splitting" is dependent on the data you're working on) and run it before your main update. Perl or awk are also very good for that sort of stuff.
But like I said before, from the little you've given us to go on, I suspect your problems start long before this. It just sounds 'off'. And unless you fix that you're probably doomed to repeat this exercise.