• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

PDF reader

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I wanted to programetically (using Java) convert a PDF to XML. i.e., one of my application to which a PDF file is provided as an input and my application needs to convert the same in to some meaningful XML file (or Html).
(please let me know if is possible ?)
Secondly, in my PDF file i have some SuperScripts and SubScripts - does your product support converting the same.
It would be of great help if you can respond to us ASAP as we are in urgent need of it.

My oprating System is : Mac OS X
Also plz advice me .. how to access the Acrobat Adob through Java API.
It would be of great help if you can respond to us ASAP as we are in urgent need of it.
Thx
Manoj
 
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Manoj,
the fact that nobody answered your question for some days maybe gives a hint to how hard this task is. I also cannot help you but give a hint how some other people look at this...
- "Converting a Word file or an Acrobat PDF to XML is a challenge I wouldn't wish on anyone -- I'm not even sure it's possible." (http://www.xml.com/pub/a/2000/09/27/qanda.html)
- "Can I convert files from PDF to XML ? No. With XMLMill it is only possible to generate PDF from XML/XSLT. It is nearly impossible to generate XML from PDF as the PDF format does not indicate which elements there are (as paragraphs, headers, footers, ...). The PDF format only defines graphical elements such as text, lines and rectangles (among others)." (http://www.xmlmill.com/controller.jsp?actionid=110)

Nevertheless, there *are* some libraries, for example at http://www.pdf2text.com. But - this one is for Windows only... :-(
Another "solution" is TextCaf� by http://www.texterity.com. But this runs as a service...
So, I think it is probable that if you will find some (really working) solution, it is not free and/or OpenSource.
For an overview of PDF-Libraries also look at http://www.geocities.com/marcoschmidt.geo/java-libraries-pdf.html
Hope it helps
Detlev
 
Ranch Hand
Posts: 427
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
http://www.pdfbox.org/ might be helpful.
 
Space pants. Tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic