• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

What is the best way to exchange files with NFSv3 to HDFS?

 
Raghavendra Desoju
Ranch Hand
Posts: 95
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

We have a use case in which the files need to be copied over back and forth to NFS and HDFS.

Could you please suggest if there is any easy way to do it?

Thanks, Raghu
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The normal way is using "hadoop fs". Are you facing some problems with it?
 
Raghavendra Desoju
Ranch Hand
Posts: 95
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a Linux server where my java program runs and it creates a file on a NAS mounted on that Linux server.

Should we mount HDFS on Linux server? or mount the NFS on HDFS cluster?

Thanks, Raghu
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What is the volume of data and rough number of files you expect to transfer? And how often?
 
Raghavendra Desoju
Ranch Hand
Posts: 95
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use Case 1: Data file is in MB to GB (500 MB ~ 5 GB). Frequency is 50 times a day.
Use Case 2: Data files sizing from 10 to 50 MB. Frequency is 2000 times a day.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The clusters I have worked with used 2 approaches:
1. If there were a lot of files and high frequency, the source NFS volume(s) would be mounted on the HDFS nodes. This had the nice benefit that a hadoop job could be run to pull in data concurrently by multiple nodes. (They were hadoop clusters by the way. I'm not sure if you are also running a hadoop cluster.)

2. A much simpler approach was to have a couple of dedicated client nodes which had hadoop installed with the same configuration files as the cluster, but they didn't run any actual hadoop/hdfs services. They were used for transferring files (using fs commands), submitting jobs, cluster tracking, stuff like that. It is possible to mount HDFS as a local filesystem, but I did not come across such a setup (doesn't mean it's bad or anything, just that I can't comment how beneficial or not it was).

Generally, a server hosting an application is dedicated to running that application and secondary services, but rarely overloaded with tasks like transferring heavy files. The risk is that disk and/or network could become bottlenecks. I wouldn't recommend mounting HDFS on your application server.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic