Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Using multithreading for Backup Process  RSS feed

 
Mukti chandnani
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We have developed a application in Java that does a remote data backup. We have written the program that reads the directory contents recursively and saves it to a HSQLDB database. This process is working fine however the backup is very slow when the amount of data is large. We are thinking of using multithreading to make the process faster. The problem is that how do we divide the work of reading the directories to the threads. Can you suggest any some algorithm for doing this.
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My wild guess is that multithreading this process will make the process slower. The bottleneck in the situation you describe is likely disk access. If you introduce several threads reading different directories, the disk will have to do more work seeking around on the platter to find where the various directories and their files are stored as each thread is made active in turn.
It may make sense to have one thread reading the disk and one thread manipulating the database, since those two resources will probably block on different resources (provided the database isn't on the same disk you're backing up!).
You should try to profile your application to see where the bottleneck really is. After you've established that, think about if having more than one thread banging away at that bottleneck will help or hurt.
 
Mukti chandnani
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have gone through the JProfiler profiling tool. In the tool it shows that the database operations like are taking up the maximum time. So I am thinking if we create multiple threads that write to the database concurrently then it might speed up the process.
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's important to identify why database access is taking so long before we can say if multiple threads will help. Have you profiled your DB server (it is on a different machine, correct)?
Database access is usually IO-bound, meaning the limitation on DB speed is the data transfer to/from disk. When uploading large amounts of information (as opposed to running a select query), you may also be constrained by network throughput. Neither of these problems can be solved by throwing threads at them.
 
Mukti chandnani
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We are using sockets to transfer the data over the network. When the data is large its taking time to transfer over the network. Can something be done to increase the speed of the data transfer over the network?
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mukti chandnani:
Can something be done to increase the speed of the data transfer over the network?


Buy a network with more throughput.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We are using sockets to transfer the data over the network. When the data is large its taking time to transfer over the network. Can something be done to increase the speed of the data transfer over the network?


What have you tried so far? Glancing at the JavaDocs it looks like you might be able to try different buffer sizes for a start.

Bill
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you need to back up operating system files to a database? If you do sometimes it helps to not commit after each insert/update/delete. You might not be aware that you are issuing commits as sometimes they are implicit. You could commit say every 500 inserts instead of every 1.
 
Tim Holloway
Bartender
Posts: 18531
61
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Doesn't HSQLDB commonly support in-memory databases?

Before tuning the network, make sure that the database server isn't overloaded. Or worse yet, page-thrashing.
 
Murthy Tanniru
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try using "rsync" or similar software for incremental file transfer to backup machine or local network to avoid network latency. And back up only the changes.
 
Mukti chandnani
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have created cached tables in HSQLDB. I read that cached tables give low performance and text tables give better performance so i switched to text tables. The first time I ran my application it ran very fast. But since then it is running even slower than when i had cached tables. What could be the reason for that?

below is my properties file and the script file.

#HSQL Database Engine 1.8.0.9
#Thu Jul 03 22:35:51 IST 2008
hsqldb.script_format=0
runtime.gc_interval=100,000
sql.enforce_strict_size=false
hsqldb.cache_size_scale=20
readonly=false
hsqldb.nio_data_file=true
hsqldb.cache_scale=18
version=1.8.0
hsqldb.default_table_type=text
hsqldb.cache_file_scale=1
hsqldb.log_size=200
modified=yes
hsqldb.cache_version=1.7.0
hsqldb.original_version=1.8.0
hsqldb.compatible_version=1.8.0


script file
---------------

CREATE SCHEMA PUBLIC AUTHORIZATION DBA
CREATE TEXT TABLE INDICES(ID BIGINT NOT NULL PRIMARY KEY,FS INTEGER NOT NULL,MP INTEGER NOT NULL,PARENT BIGINT NOT NULL,ISDIR BOOLEAN NOT NULL,NAME VARCHAR(256),EXT VARCHAR(50),PARENTPATH VARCHAR(4096),MODTIME TIMESTAMP NOT NULL,BYTES BIGINT NOT NULL,STAMPUPD TIMESTAMP DEFAULT NOW NOT NULL,ISDELETED BOOLEAN NOT NULL)
CREATE INDEX U_INDICES_FS_MP_SELF ON INDICES(ID,MP,ISDELETED)
CREATE INDEX INDICES_COUNT ON INDICES(ISDIR,ISDELETED)
CREATE INDEX INDICES_FS_MP_PARENT ON INDICES(MP,PARENT,ISDELETED)
CREATE INDEX INDICES_NAME_PARENTPATH ON INDICES(NAME,PARENTPATH)
CREATE INDEX INDICES__PARENT ON INDICES(PARENT,ISDELETED)
CREATE INDEX INDICES_MP_ISDIR_EXT ON INDICES(MP,ISDIR,EXT,ISDELETED)
SET TABLE INDICES SOURCE "indices.csv"
CREATE TEXT TABLE INVERTEDINDEX(FILEID BIGINT NOT NULL,FOLDERID BIGINT NOT NULL)
CREATE INDEX INVERTEDINDEX_FOLDERID_ISDELETED ON INVERTEDINDEX(FOLDERID)
SET TABLE INVERTEDINDEX SOURCE "invertedindex.csv"
CREATE TEXT TABLE DB_VERSION(DBVERSION NUMERIC,REFRESHCOUNT NUMERIC,UPDATEDATE NUMERIC)
SET TABLE DB_VERSION SOURCE "db_version.csv"
CREATE USER SA PASSWORD ""
GRANT DBA TO SA
SET WRITE_DELAY 10
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!