I need to Queue large number of objects (this object is Serializable) using Queue in java 6.x. But, having limited amout of memory this Queue will be OutOfMemory in case we queue large number of objects. My question is - if the object implements Serializable and is queued (ie queue.offer(object)) will it automatically be flattened to disk and won't consume memory. Or, we can have an id (wrapped in Integer) and Queue defined of Type Integer (Queue<Integer>). We then manually associate an id with the object and then manually serialize it to disk.
All the objects have to be Queued before they are dequeued. What is the best way to resolve this issue.
Taggu Gupta wrote:My question is - if the object implements Serializable and is queued (ie queue.offer(object)) will it automatically be flattened to disk and won't consume memory.
No. At least, not unless the documentation for the queue implementation specifically says that will happen. I very much doubt that the queue implementations in the standard API do that, but you could read their documentation to find out. If they don't say they use disk as a backing store, then they don't.
Or, we can have an id (wrapped in Integer) and Queue defined of Type Integer (Queue<Integer>). We then manually associate an id with the object and then manually serialize it to disk.
Sure, you could do that if you liked. But writing all queue entries to disk just to avoid possible memory problems seems like overkill to me. Why not just write entries to disk if there is actually a memory problem? You could use SoftReference objects and a ReferenceQueue to allow Java's memory management to help you out.
On the other hand... what is the point of putting millions of entries into a queue? I assume you have something consuming those entries, and that your producers are working faster than your consumers. Then the best strategy is to tell the producers to slow down. Using a bounded queue of only a few hundred entries should be just fine.
Or is there something about your producers that can't be told to slow down? If so then perhaps you should put more consumers to work. Anyway useful advice can't really be given in the absence of any information about the problem.
And no, your problem isn't "I need to put millions of entries in a queue". The problem is whatever business decision led you to think that.
Thanks!. The business logic is :
There is an archive file which may contains millions of email msg files (maybe with attachments). Now I want to unzip this archive and process the email msg files. As part of processing the email msg files, attachments are to extracted. Also, the metadata of the top level message file should be inserted first in the db and then its children. Because of this I'm queueing all the message files and then its children (this will enable to get the parent first and then its children on dequeue). After getting all the message files queued up , then i start dequeueing to actually start the processing of each such file (parent first and then its children and so on...). But, the problem is queue could easily run out of memory if there are large number of message file in a directory.
Also, after extracting from the archive file , I listfiles from the directory to get the list of files. It is possible that there again could be large number of files in a directory and there could be memory overrun issue.
What is the best option i can use to solve this memory overrun issues fir lisfiles and queue
I missed the part which explained why you couldn't start the consumers before all of the messages had been queued up. I know you said "Because of this" but it wasn't obvious to me why that followed. I couldn't follow your explanation.
But let me put it this way. Your goal is to have a bounded queue. Arrange your production end so that you can do that. Don't try to justify why you have to have an unbounded queue.