I wanted to programetically (using Java) convert a PDF to XML. i.e., one of my application to which a PDF file is provided as an input and my application needs to convert the same in to some meaningful XML file (or Html). (please let me know if is possible ?) Secondly, in my PDF file i have some SuperScripts and SubScripts - does your product support converting the same. It would be of great help if you can respond to us ASAP as we are in urgent need of it.
My oprating System is : Mac OS X Also plz advice me .. how to access the Acrobat Adob through Java API. It would be of great help if you can respond to us ASAP as we are in urgent need of it. Thx Manoj
Hello Manoj, the fact that nobody answered your question for some days maybe gives a hint to how hard this task is. I also cannot help you but give a hint how some other people look at this... - "Converting a Word file or an Acrobat PDF to XML is a challenge I wouldn't wish on anyone -- I'm not even sure it's possible." (http://www.xml.com/pub/a/2000/09/27/qanda.html) - "Can I convert files from PDF to XML ? No. With XMLMill it is only possible to generate PDF from XML/XSLT. It is nearly impossible to generate XML from PDF as the PDF format does not indicate which elements there are (as paragraphs, headers, footers, ...). The PDF format only defines graphical elements such as text, lines and rectangles (among others)." (http://www.xmlmill.com/controller.jsp?actionid=110)
Nevertheless, there *are* some libraries, for example at http://www.pdf2text.com. But - this one is for Windows only... :-( Another "solution" is TextCaf� by http://www.texterity.com. But this runs as a service... So, I think it is probable that if you will find some (really working) solution, it is not free and/or OpenSource. For an overview of PDF-Libraries also look at http://www.geocities.com/marcoschmidt.geo/java-libraries-pdf.html Hope it helps Detlev