aspose file tools*
The moose likes Hadoop and the fly likes Extract data from Hadoop File system using nutch Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Extract data from Hadoop File system using nutch" Watch "Extract data from Hadoop File system using nutch" New topic
Author

Extract data from Hadoop File system using nutch

syruss kumar
Ranch Hand

Joined: Jul 23, 2009
Posts: 98
Hi,

I’m newbie to nutch.I have installed and configured nutch to crawl the site.I want to extract the data from the crawl db .Is there any way to get the data programmatically?

Thanks in advance,

All search starts with beginner's luck and all search ends with victor's severly tested.
syruss kumar
Ranch Hand

Joined: Jul 23, 2009
Posts: 98
Hi all,

Here is the solution. Use Nutch api to extract the data.Under crawl/segment folder it placed the content,parsed text,parsed data etc.

Sample code to read data from hadoop file system using Nutch 1.6 api



parin jogani
Greenhorn

Joined: Apr 06, 2013
Posts: 1
Thank you! of great help.
Any way to extract a particular file format only (eg. pdf)?
 
Gartner says :Bigdata will be most advanced analytics products by 2015 !

Time to Become Big data architect by learning Hadoop(Developer, Administration,Analyst,QA),Cassandra,MongoDb,HBase,Datascience, Mahout, Splunk,R etc) from scratch to expert level

https://intellipaat.com/course-cat/big-data/?utm_source=coderanch%20&utm_medium=text&utm_campaign=coderanchdx1
 
subject: Extract data from Hadoop File system using nutch