Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Writing files in HDD becomes slower with the increase in number of threads  RSS feed

 
Tapas Chand
Ranch Hand
Posts: 614
9
BSD Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All
I am creating a multithreaded env using ExecutorService. All my threads are doing the same thing.
They are getting the data from DB, preparing the PDF using itext and writing PDF at a location in D drive.
But I noticed a weird thing. As I am increasing the number of threads, my end-to-end process becomes slower.
For 1 thread - 4000 pdf generated in 1 hour
For 2 threads - 3500 pdf generated in 1 hour
For 3 threads - 3200 pdf generated in 1 hour
For 4 threads - 3000 pdf generated in 1 hour

Using logger, it became clear that, getting the data from DB is very fast, bottleneck is the PDF writing operation.

Somewhere I read that in windows, writing multiple files at same directory simultaneously becomes slower compared to sequential writing.
If true, what other logic I can implement to get a higher performance.
Thank you.

Environment details
--------------------------
OS - Windows 7 ,32 bit
RAM - 3 GB
Processor - Core i3
JDK - 1.6
DB - PostgreSql 9.3
Size of PDF - varies between 500KB and 2 MB
 
Henry Wong
author
Sheriff
Posts: 23275
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tapas Chand wrote:
Somewhere I read that in windows, writing multiple files at same directory simultaneously becomes slower compared to sequential writing.


It isn't specific to Windows. It is true for most disk drives, and on most OSes. Depending on how many heads a disk has, exceeding that amount will cause the drive head to constantly switch between servicing two files, instead of staying sequential and servicing only one file at a time.

Henry
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd be tempted to look at how your doing your file writes e.g. buffered output streams for instance if the threads are mostly writing to memory and flushing occasionally then you would expect better scaling with the number of threads.
 
Mahender Parkipandla
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would like to know from OP on logic used to write into Disk. We do have an application which writes into various files using FileOutputStream on same disk location. It was changed to BufferedOutputStream to improve performance of disk operations.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Instead of multiple Threads doing the same thing I would try division of labor.

One Thread performing a DB read and PDF document creation (the compute intensive operation) - placing completed document byte[] in a queue.

Second Thread taking finished byte[] and writing to disk - spends most of its time waiting of OS to complete a write.

Bill
 
Tapas Chand
Ranch Hand
Posts: 614
9
BSD Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you guys for your time.

@Chris Hurst - I am using iText for PDF generation. So my code goes like

I guess PDF is being written on HDD when document.close() method is executed.

@William Brogden - Initially I also thought of this idea. So I tried MDB, but got error that the PdfPTable class is not serialized. Thus got stuck.

Kindly throw some light.
Thank you.
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So did you try BufferedOutputStream ?




and play with the buffer size as the third parameter eg

 
Paul Clapham
Sheriff
Posts: 22374
42
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tapas Chand wrote:Initially I also thought of this idea. So I tried MDB, but got error that the PdfPTable class is not serialized. Thus got stuck.


That makes it sound like you're trying to send the entire contents of the file over the network, or at least trying to load the entire contents of the file into memory to be copied around by your JMS implementation. That doesn't sound too promising as a way to speed things up, instead I would think you're adding a whole lot of overhead there.

But if you want to persist with the MDB idea, and if the MDB and your client are on the same machine, then I would suggest just sending the file name in the message.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!