• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • paul wheaton
  • Jeanne Boyarsky
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
  • Himai Minh
Bartenders:

My multithreaded program is giving Java out of memory error when run on large data

 
Ranch Hand
Posts: 2966
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Using Thread pool I created a multithreaded application, which reads millions of files sequentially, then different threads process these files and write to different output files. Multiple threads are for the purpose of doing the over task faster. There is a parameter for maximum threads to be running at any time. I have set it to some number. How to decide what this number should be. Should it be 25 or 50 or 100 or 1000.How to decide what should be this number.

When I tested this program of lesser files it worked fine but for millions of files it is giving out of memory error. I think that each thread writing to an output file means numerous files would be open at one particular time and that might be the reason it is giving out of memory error. Please advice on how to resolve this issue.


thanks
 
Ranch Hand
Posts: 145
8
Mac MySQL Database Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey Monica,

When you spawn a new thread, at least one object gets instantiated. This is, at least, an instance of your implementation of Runnable. Add to this all class members of this instance, all class members of these class members, and so on.

Every object takes some memory, and while it is hard to tell exactly how much, you can estimate it.

If your object has two class variables of java.lang.Long type, an int[] array with 10 elements, and a String initialized with "Hello World", the size will be 2 * 8 + 10 * 4 + 11 * 2 = 78 bytes.
This is because two Longs will take 2 * 8 bytes, int[] array with 10 elements will take 40 bytes, and "Hello World" string will take 11 * 2 = 22 bytes.
You can find a summary about size of primitive data types here. Non-primitive objects ultimately are a composition of elements of primitive data types.
It is not always this simple, some objects can be referenced from more than one object, so the structure will be graph and not tree, but rough estimate of object size is still possible.

Now, back to threads. If you spawn 100 threads, at least 100 objects will be instantiated and will take some RAM. You should be able to roughly estimate the size of object(s) created by that thread, and then you should get an idea of how much space 100 of them will take.
Then, this memory will not be available for new objects at least until these threads will terminate (will exit from run() method). It can be not available for longer time, actually, because you never know when the garbage collector will decide to collect those no longer used objects; but no less than the duration of run().
So if your threads are long running (take seconds or minutes to complete), and you spawn new threads with a rate of, say, 10 threads a second, then you are in trouble. The new objects will be created faster then older threads will complete their jobs, and so before you know you'll hit OutOfMemory.
But if your threads only take milliseconds to complete and you create new threads not so often, like one thread every second, then you probably gonna be fine.

So, for you the factors to consider are

1) Size of objects created with every new thread,
2) New threads spawning rate,
3) Time for thread to complete,
4) Memory available to JVM.

When you do all the math, you'll get the answer for your question




 
Monica Shiralkar
Ranch Hand
Posts: 2966
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks.

So does that mean in Thread I should put call the Sleep method as below. Is that advisable?

Should it be like below:



Instead of :



 
Marshal
Posts: 5982
412
IntelliJ IDE Python TypeScript Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
With regards to deciding how big to make your Thread Pool, it depends on the hardware you are running your application on. Suppose you have a Quad Core machine, then you can only run 4 threads in parallel. If you're writing files to a physical mechanical disk then how many write heads does it have? You can't write 100 files in parallel if your disk only has 2 write heads.

Often having a thread pool too large will actually degrade performance because you are introducing contention on your physical resources. Martin Thompson calls this Mechanical Sympathy
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

So does that mean in Thread I should put call the Sleep method as below. Is that advisable?


No, it means that you should do the math as Mike suggested. If the combined memory needs of all threads is a large portion of the available JVM memory, then you have a problem.

Putting a thread to sleep is the opposite of what you intend to do, isn't it? It gets less done, whereas you want to get more done. As I already pointed out in the other topic where you asked about this, the best number of output threads is likely < 10 due to the number of write heads on the disk. Whether it make sense to have more read threads than write threads depends on the specifics of your situation, about which we know nothing. It would mean more synchronization overhead, that's for sure.
 
Monica Shiralkar
Ranch Hand
Posts: 2966
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Thanks

Putting a thread to sleep is the opposite of what you intend to do, isn't it? It gets less done, whereas you want to get more done.





So if I put Thread.sleep, it would not work fast as desired but willl there be lesser probabillity of getting a out of memory problem?



As I already pointed out in the other topic where you asked about this, the best number of output threads is likely < 10





I will take care about this.

No, it means that you should do the math as Mike suggested. If the combined memory needs of all threads is a large portion of the available JVM memory, then you have a problem.





Actually if the numbers of files received in folder the math will again change.
 
Marshal
Posts: 80771
488
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica. Shiralkar wrote: . . .
So if I put Thread.sleep, it would not work fast as desired but willl there be lesser probabillity of getting a out of memory problem?
. . .

Don't know, but it will probably not prevent your problem.
 
Monica Shiralkar
Ranch Hand
Posts: 2966
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

and you create new threads not so often, like one thread every second, then you probably gonna be fine.



Does this depend on the parameter for maximum number of threads running at a time when we create a thread pool? So if the maximum number of threads running at a time parameter is low, will it cause less threads created in a duration of time?
 
Monica Shiralkar
Ranch Hand
Posts: 2966
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I did some changes in program and made maximum number of threads parameter to be 5:


But the error message recieved was:

Exception in thread "1034" Exception in thread "1032"



It is surprising how thread number 1034 got started so soon despites specifying

ExecutorService tpes = Executors.newFixedThreadPool(threadPoolSize);


where threadPoolSise value I have set to 5

There is one more parameter: numberWorkers


where I will be in loop from 1 to number of files in folder
and I have set this parameter to be the equal to the number of files in the folder.



For this numWorkers paramter is I set a lower value then entire processing would not be done. Please advice on what can be done for the second paramter.
 
Monica Shiralkar
Ranch Hand
Posts: 2966
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks all. The program worked best when number of threads were equal to 15. After that the performance started decreasing..
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic