• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

scp multiple files concurrently

 
Ranch Hand
Posts: 193
Mac OS X Fedora Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I need to scp multiple files to an SSH server concurrently.
My question is that how many files can I transfer concurrently in an efficient way? (I know that it depends also on the network, but let's ignore this factor for now. Because we have a Giga network).

For example, if I need to transfer 10,000 (plus) files. Obviously scp them sequentially is unacceptable. (That means need to wait one transmission finish before starting another one).
So I decided to use multiple process(es) to transfer files concurrently. Can anybody tell me what is the right approach to deal with such a case?

Thanks
 
Jiafan Zhou
Ranch Hand
Posts: 193
Mac OS X Fedora Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To be more specific, how many processes maximum is preferred to run scp concurrently.
 
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually, I think I'd look at secure rsync for something like that.
 
Jiafan Zhou
Ranch Hand
Posts: 193
Mac OS X Fedora Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Using SSH server is a requirement, so bad luck, rsync secure is out of consideration.
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Did you find out what your bottleneck is?

If your sourcefiles are comming from the same harddrive, I wouldn't expect much from threads.
The harddrive-question I would ask for the second host too.

The head of your drive will jump for the different threads from here to there and back again. Maybe it is slower than a single thread.
 
Jiafan Zhou
Ranch Hand
Posts: 193
Mac OS X Fedora Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Stefan Wagner:
Did you find out what your bottleneck is?

If your sourcefiles are comming from the same harddrive, I wouldn't expect much from threads.
The harddrive-question I would ask for the second host too.

The head of your drive will jump for the different threads from here to there and back again. Maybe it is slower than a single thread.


I am not sure I totally understand this. Yes, I would say they come from the same harddrive. (Actually these files come from the same directory).

Plus scp will use a different Process on Linux if execution. Although I do think about use thread.
 
author and jackaroo
Posts: 12200
280
Mac IntelliJ IDE Firefox Browser Oracle C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Jiafan Zhou:
Using SSH server is a requirement, so bad luck, rsync secure is out of consideration.



From the man pages for rsync:

For remote transfers, a modern rsync uses ssh for its communications, but it may have been configured to use a different remote shell by default, such as rsh or remsh.



My thoughts are that trying to transfer them in parallel will result in the bandwidth being the bottleneck. Assuming the sum total of all the files are larger than a gigabit then it doesn't matter how many concurrent transfers you attempt, you can never get more than a gigabit of data running through your tubes at any given time.

That being the case, I would probably just set up an rsync with wildcards, and come back when it is done.

Alternative to that case: if the data is large enough (TB?) then it might be worth considering sneaker-net. Have someone who has physical access to the computers copy the files to a large portable hard drive and transfer them that way.

Andrew
 
Tim Holloway
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually, Your Mileage May Vary, but I learned to respect SANs when someone pointed out that the disk-to-RAM data transfer rate on a local hard drive is typically MUCH slower than Gigabit Ethernet. So the theoretical penalties are not as bad as they seem, assuming low levels of contention from other parties sharing the network between source and destination.

In the case of running many scp's in parallel, a bigger issue would be, as noted, the latency of the source disk, especially if it's a single disk and not a tuned array.

Of course, as is typical in today's complicated world, simply "knowing" isn't enough - too many variables apply. The only real way to tell is to benchmark and tune.
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have no experience using these pssh tools, but I've heard of them. I know the pscp is designed to send one file to many machines; maybe it can also send many files to one machine?
 
Jiafan Zhou
Ranch Hand
Posts: 193
Mac OS X Fedora Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all for sharing the invaluable knowledges with me. Couple of good ideas outlined below:

1. I agree with that the "rsync" command (or pssh) is a better replacement for scp, however, I cannot use them for couple of reasons. (believe or not) Plus, "rsync" still transmits one file at a time, it does not use the full bandwidth of our network.

2. Again I agree with that reading from the hard-disk to ROM is definitely a bottleneck without doubts. But I don't too concern about this right now, because the biggest issue at the moment is how to use the maximum bandwidth of the network. i.e. How to transmit multiple files using scp at a minimum time? (I might come back to solve the slow disk-to-ram reading, but not at the moment)

3. I don't agree with using single thread/process (rsync or scp) to transmit files. (I am not yet convinced). I forgot to mention that files being transferred into server are relatively small. (i.e. probably less than 1 M for each). That motivates the whole idea of performing parallel transfer. (i.e. multiple processes of scping).

4. Physically access and copy files is a great idea, but technically impossible in my case.

5. I will definitely run some benchmark testing and tune.

And because this concurrent scping are handled by a Java program, which creates a separate Process for every file it is going to transfer. The original problem I posted is described by the following link: (and provided my initial code proposal)

https://coderanch.com/t/234297/threads/java/Create-new-thread-each-entity

Any suggestions are welcome.

Again thanks a lot.
 
Tim Holloway
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hmmm. I hand't realized that rsync could only parallelize in cetain specific cases. Oh well.

The simplest solution would be the pscp utility, which is the parallel scp part of the suite that includes pssh. Taking the Not Invented Here approach and writing your own solution in the Unix world costs you geek points. Adding to the inventory of source code that has to be maintained and kept up to date when you could take advantage of someone else's work costs you business points. And if you use Runtime.exec() just so you can spawn multiple scp commands costs you double geek points, since you'd have less overhead doing this as a shell script and keeping Java for something that needs it. Just saying.

100,000 threads is not realistic. While there is a certain economy to be gained by using sharable code, the core variables for each thread are unique, so you're talking a lot of memory. More importantly, you're going to put a real strain on the thread dispatcher. But the real killer is on the receiving end.

If you did the brute-force approach and did a Runtime.exec() on the scp command, you'd be requiring the receiving machine to process 100,000+ login requests in a very short period of time (in addition to almost the same amount of overhead on the sending machine as it created new shell environments). It would almost certainly buckle under the strain. If it did not, you'd still have the issue that you'd not only need to bump the thread limit on the sending machine, you'd have to do the same on the receiving machine. At best, excess requests would bounce. At worst, you could interfere with other processes and risk crashing the whole system.

There's a certain overhead to setting up and tearing down a file transfer context even without the overhead of setting up a new user environment (login). The most efficient approach is a batched one where multiple files (especially small ones) can be sent within a single transfer request.

In some ways, your problem resembles what BitTorrent was designed to handle, although torrents distribute the process among multiple hosts.
 
Stefan Wagner
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Jiafan Zhou:
... Because we have a Giga network).

For example, if I need to transfer 10,000 (plus) files. Obviously scp them sequentially is unacceptable. (That means need to wait one transmission finish before starting another one).



Ah - you fear to need 10 000 logins?
Well - no problem.
Tar all your 10 000 files together into one single file, and transfer them at once:

Instead of
The last 2 suggestions aren't tested.

If your real concern is only the login, you should read about using ssh with public keys.
I'm sorry, I just could provide a german wiki-link.
 
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You should look at the pure Java SSH clients, instead of trying to run SCP in separate processes. With Trilead SSH you can do multiple SCP sessions over a single SSH connection, to make the most of a single login and TCP connection. Their SCPClient class is thread-safe, and is really easy to use.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
use pssh

http://code.google.com/p/parallel-ssh/
 
Stefan Wagner
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Joshua, don't wake the zombies.

Do you expect us to been waiting for allmost two years for that answer, which was a tip, given by EFH already?
 
please buy this thing and then I get a fat cut of the action:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic