Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to exclude a directory in Hadoop?

 
ruchika sharma
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I have just started with Hadoop and from last 1-2 days I am stuck here.....

I have several directories under /user and I am in the process of copying data between two clusters and would like to exclude 1 directory that has 15TB data. I didn;t find in apache docs a way to exclude a dir and thus hoping if someone could help me here....

 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use distcp -f.
Create a text file containing the list of directories to be copied (let's call it "listing_file" here), put it on source filesystem, and use hadoop distcp -f <listing_file_URL> <destination_cluster_URL>.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic