Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hadoop unziping to processing xml files

 
Rahul Mahindrakar
Ranch Hand
Posts: 1868
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

I have tar files which contains text files with xml like



I am starting out with working with Hadoop and would need some high level knowledge how I should go about go about doing this

1) How do I scp the files over to where I can provide them to Hadoop. Is there some component or framework
2) How to untar the file once it is received. I think i have googled and there are some components. But has someone over here some prior experience.
3) How to convert multiple line Text + xml into single line for me to process like
4) HOw to now process this line. Should I process it as text or XML. I guess for beginners text is ok

I just need some ideas.

Thanks
Rahul M.

 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic