• Post Reply Bookmark Topic Watch Topic
  • New Topic

xml in mapreduce error

 
Gauravvyas Vyas
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am trying to parse an xml file using mapreduce and mahout xmlinputformat class. but I am getting a blank output. when i run code in eclipse it says number of map input records=0 and number of mapoutput records=0.

attaching my code, Please some one help.

code for driver class is -
public class Books {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text>
{

public void map(LongWritable key, Text values, OutputCollector<IntWritable, Text> output, Reporter reporter)
{
XMLParse xp;

try {
xp = new XMLParse(values.toString());

List<Integer> bookidlist = xp.getBookId();

for(int id:bookidlist)
{
String mapoutput = xp.getAuthor()+", "+xp.getTitle()+", "+xp.getGenre()+", "+xp.getPrice()+", "+xp.getPrice()+", "+xp.getPublishDate()+", "+xp.getDescription();
output.collect(new IntWritable(id), new Text(mapoutput));

}
}
catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
}

public static void main(String []args) throws IOException
{
JobConf job = new JobConf(Books.class);
job.set(XmlInputFormat.START_TAG_KEY, "<catlog>");
job.set(XmlInputFormat.END_TAG_KEY, "</catlog>");
job.setJarByClass(Books.class);
job.setMapperClass(Map.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);

job.setJobName("Books-job");

job.setInputFormat(XmlInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);

String in = "/home/cloudera/Books";
String out = "/home/cloudera/Booksouput";

FileInputFormat.setInputPaths(job, new Path(in));
FileOutputFormat.setOutputPath(job, new Path(out));

JobClient.runJob(job);

}
}




/* code for xmlparse class is- */


class XMLParse {
public Document doc;

public XMLParse(String file) throws ParserConfigurationException, SAXException, IOException
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

InputSource in = new InputSource(file);

doc = builder.parse(in);


}

public List<Integer> getBookId()
{
List<Integer> bookid = new ArrayList<Integer>();
NodeList nlist = doc.getElementsByTagName("book id");

for(int i=0; i<nlist.getLength();i++)
{
String id = nlist.item(i).getTextContent();

int j= Integer.parseInt(id);
bookid.add(j);


}
return bookid;

}

public String getAuthor() {
NodeList nList = doc.getElementsByTagName("author");
return (nList.item(0).getTextContent());

}


public String getTitle() {
NodeList nList = doc.getElementsByTagName("title");
return (nList.item(0).getTextContent());

}

public String getGenre()
{
NodeList nlist = doc.getElementsByTagName("genre");
return(nlist.item(0).getTextContent());
}

public int getPrice()
{
NodeList nList = doc.getElementsByTagName("price");
return Integer.parseInt((nList.item(0).getTextContent()));

}

public String getDescription()
{
NodeList nList = doc.getElementsByTagName("description");
return ((nList.item(0).getTextContent()));

}

public String getPublishDate()
{
NodeList nlist = doc.getElementsByTagName("publish_date");
return(nlist.item(0).getTextContent());
}

}
 
arumugarani sundaram
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,


I feed like xmlparse class itself has some issues.

did you check xmlparse output class? Is it displaying the results?

can you post your xml to debug further?

Thanks,
Arumugarani
 
What are you doing? You are supposed to be reading this tiny ad!
the new thread boost feature brings a LOT of attention to your favorite threads
https://coderanch.com/t/674455/Thread-Boost-feature
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!