Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

xml in mapreduce error

 
Gauravvyas Vyas
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am trying to parse an xml file using mapreduce and mahout xmlinputformat class. but I am getting a blank output. when i run code in eclipse it says number of map input records=0 and number of mapoutput records=0.

attaching my code, Please some one help.

code for driver class is -
public class Books {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text>
{

public void map(LongWritable key, Text values, OutputCollector<IntWritable, Text> output, Reporter reporter)
{
XMLParse xp;

try {
xp = new XMLParse(values.toString());

List<Integer> bookidlist = xp.getBookId();

for(int id:bookidlist)
{
String mapoutput = xp.getAuthor()+", "+xp.getTitle()+", "+xp.getGenre()+", "+xp.getPrice()+", "+xp.getPrice()+", "+xp.getPublishDate()+", "+xp.getDescription();
output.collect(new IntWritable(id), new Text(mapoutput));

}
}
catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
}

public static void main(String []args) throws IOException
{
JobConf job = new JobConf(Books.class);
job.set(XmlInputFormat.START_TAG_KEY, "<catlog>");
job.set(XmlInputFormat.END_TAG_KEY, "</catlog>");
job.setJarByClass(Books.class);
job.setMapperClass(Map.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);

job.setJobName("Books-job");

job.setInputFormat(XmlInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);

String in = "/home/cloudera/Books";
String out = "/home/cloudera/Booksouput";

FileInputFormat.setInputPaths(job, new Path(in));
FileOutputFormat.setOutputPath(job, new Path(out));

JobClient.runJob(job);

}
}




/* code for xmlparse class is- */


class XMLParse {
public Document doc;

public XMLParse(String file) throws ParserConfigurationException, SAXException, IOException
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

InputSource in = new InputSource(file);

doc = builder.parse(in);


}

public List<Integer> getBookId()
{
List<Integer> bookid = new ArrayList<Integer>();
NodeList nlist = doc.getElementsByTagName("book id");

for(int i=0; i<nlist.getLength();i++)
{
String id = nlist.item(i).getTextContent();

int j= Integer.parseInt(id);
bookid.add(j);


}
return bookid;

}

public String getAuthor() {
NodeList nList = doc.getElementsByTagName("author");
return (nList.item(0).getTextContent());

}


public String getTitle() {
NodeList nList = doc.getElementsByTagName("title");
return (nList.item(0).getTextContent());

}

public String getGenre()
{
NodeList nlist = doc.getElementsByTagName("genre");
return(nlist.item(0).getTextContent());
}

public int getPrice()
{
NodeList nList = doc.getElementsByTagName("price");
return Integer.parseInt((nList.item(0).getTextContent()));

}

public String getDescription()
{
NodeList nList = doc.getElementsByTagName("description");
return ((nList.item(0).getTextContent()));

}

public String getPublishDate()
{
NodeList nlist = doc.getElementsByTagName("publish_date");
return(nlist.item(0).getTextContent());
}

}
 
arumugarani sundaram
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,


I feed like xmlparse class itself has some issues.

did you check xmlparse output class? Is it displaying the results?

can you post your xml to debug further?

Thanks,
Arumugarani
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic