Hello Manoj,
the fact that nobody answered your question for some days maybe gives a hint to how hard this task is. I also cannot help you but give a hint how some other people look at this...
- "Converting a
Word file or an Acrobat PDF to XML is a challenge I wouldn't wish on anyone -- I'm not even sure it's possible." (
http://www.xml.com/pub/a/2000/09/27/qanda.html)
- "Can I convert files from PDF to XML ? No. With XMLMill it is only possible to generate PDF from XML/XSLT. It is nearly impossible to generate XML from PDF as the PDF format does not indicate which elements there are (as paragraphs, headers, footers, ...). The PDF format only defines graphical elements such as text, lines and rectangles (among others)." (
http://www.xmlmill.com/controller.jsp?actionid=110)
Nevertheless, there *are* some libraries, for example at
http://www.pdf2text.com. But - this one is for Windows only... :-(
Another "solution" is TextCaf� by
http://www.texterity.com. But this runs as a service...
So, I think it is probable that if you will find some (really working) solution, it is not free and/or OpenSource.
For an overview of PDF-Libraries also look at
http://www.geocities.com/marcoschmidt.geo/java-libraries-pdf.html Hope it helps
Detlev