• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Running several Python scripts in concurrent threads taking too long

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Good day.

I posted on this thread about a problem I was having, where the threads I was running never finished - that problem is solved. Now, though, I have another problem, which is in the last post of that thread, but I'll talk about it here.

The threads are taking too long to finish. When I schedule several processes via Linux's "cron", they're all done after, at most, one and half, two minutes tops. When they run via my java scheduler, however, they take about five, six minutes to finish. What could be the problem here? How do I solve it?
 
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I took a look at your code. My feeling is an issue with (1) thread scheduling and (2) with the logging - especially logging to files.

As for the logging to a file: A typical Disk has only a single write head, which means that you can only write to one place at a time. When you try to write to the disk with multiple threads then the disk head has to move to file_location_1 and write a chunk, then seek to file_location_2 and write a chunk, then seek to file_location_3 etc... These file locations can be in very different parts of the disk and what happens is the disk head spends more time seeking from one spot to another. When you condense the writes into a single thread then the disk can write larger chunks to the same area before being diverted to another location and the write efficiency increases. Try feeding all your logging lines into a single Thread (maybe have a LogLine class that maps a file and String to write to the file. The Gobblers append LogLines into a synchronized queue and a writer thread pops lines off of the queue and writes the string to the proper file).

The other thing to worry about is thread scheduling. It seems like you have lots of tasks - each task runs (1) an external process (the python script), (2) a thread which starts and waits for the external process, and (3) a StreamGobbler which consumes the input and logs the data. Your system has a finite number of threads it can execute in parallel, and so if you have so many tasks many of them are going to be put into a waiting state - waiting for processor time and access to the file system. Since all your tasks would be competing for the same limited resources you could be wasting time with processor time-sharing and/or causing some of the tasks to be resource starved - not getting a 'fair' share of the processor they need to execute. Pushing the file IO into a single thread may help here as well, since then your StreamGobblers won't be competing with each other for File IO. But you should also tune the number of Threads which will be running at any given time to reflect the resources your system actually has. Have a Thread Pool maybe of 1x, 2x, or 4x threads (where x is the number of executable threads your processors can run), and see if reducing the number of threads increases performance (then fine tune the multiple to get the best performance). This might be something already built in to cron4j, I don't know I don't use it.

Finally, you should not really be guessing about what is happening. Get a Profiler and attach it to your system. See what threads are in what states, how much CPU is being consumed, etc... That will help you really nail down problem areas, whereas the above are just generalized strategies.
 
Aroldo Bettega
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oh my god. Even before I got a chance to actually read your post, everything kinda exploded. We tried using the system (which is working - slowly, but working) in our test server to see if it was a local problem. Every freaking script decided to not work for no reason. I naturally assumed it was a library problem - it was not, I packaged everything into my runnable jar, and, just to be sure, set them up in our server. Still, no dice. Maybe it was permissions, I thought. Again, it was not. I'm seriously stumped to what it could be. I am aware this is not a thread-specific problem, but damn, everything in this thing seems to blow up in my face.

Anywho, back to the main issue. I attempted using Quartz Scheduler to see if it was a problem with Cron4J or something. Alas, it was not, everything ran just as slowly (if not more) with Quartz. I'll read your post more thoroughly later on and post my thoughts on it. Right now, I'm trying to solve this incompatibility issue.

Thanks in advance!
 
Getting married means "We're in love, so let's tell the police!" - and invite this tiny ad to the wedding:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic