• Post Reply Bookmark Topic Watch Topic
  • New Topic

Loading files in parallel using Threads

 
Niall Loughnane
Ranch Hand
Posts: 209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

any advice on this would be great,

what i am trying to do is to read multiple (100) files into a Map in parallel rather than reading one file at a time,

I have read that there may be restrictions to this due to I/O processing but is there a way to use ExecutorService and Threads to run multiple threads in parallel and load the contents of the files into a map contained of DTO's,

Thanks
 
Paul Clapham
Sheriff
Posts: 21892
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sure, that sounds like a pretty typical use for an ExecutorService.
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The limit usually comes with the throughput and read-heads on the disk(s) involved. If the files are all stored in a single disk, and that disk only has a single read-head then you will likely take longer to read using multiple threads than just one (as there would be more time for the read-head spent seeking to a file required for a particular thread). If you have the files on multiple hard disks, or the files are stored on a RAID with multiple disks and a controller that supports parallel reads, then you can take advantage of multiple threads. So depending on your scenario you might consider making the pool of threads available to the Executor configurable so you can tune the system to the available hardware.
 
Niall Loughnane
Ranch Hand
Posts: 209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

and thanks for your answers,

what im trying to do is load files on a single PC CPU to load them faster,

do you think this is possible?

Thanks
 
Paul Clapham
Sheriff
Posts: 21892
36
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The number of CPUs doesn't really matter, because from the point of view of a CPU, file access consists almost entirely of waiting for the disk to spin to the right position. So depending on what you do with the data from the file, it's quite likely that even if you could process 100 files from 100 different disks simultaneously you still wouldn't be CPU-bound.

However what Steve said about reading multiple files in multiple threads tending to be less useful than you might think, that still applies. You really won't know until you try it whether using multiple threads speeds things up.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also note that operating system buffering and disk drive electronics buffering sit between your program and the physical disk. Therefore, it is time to experiment! Why not write up your results and let us know what happened so future readers can learn.

Bill
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!