• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Junilu Lacar
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Ganesh Patekar
  • Tim Moores
  • Pete Letkeman
  • Stephan van Hulst
Bartenders:
  • Carey Brown
  • Tim Holloway
  • Joe Ess

Generate text from PDF.  RSS feed

 
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a requirement where I have to convert the PDF document to HTML5. I do not want to use any available tool achieve this. I want to write my own code to achieve this. Being java developer I have started with iText but I saw that, iText just extract the text from PDF and does not keep the formatting layout on PDF.

Can someone please guide which API i should use to achieve this? below is my high level requirement.

1-Extract the text from the PDF without loosing formatting layout.

2-extract the images if any.

3-Retain the formatting in the newly converted HTML5 page same as that of PDF page.

Thanks in Advance.
 
Rancher
Posts: 42975
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm confused - you do not want to use any available tool (why? PDF is hugely complicated, do you really want to write all that code yourself?), but you considered using iText? There's a disconnect that you need to resolve for us before we can usefully recommend an approach.

AFAIK there is no free tool to convert PDF to anything that keeps the formatting. You can use the PDFRenderer project as a basis - it can display PDFs in Swing, so obviously it knows what to do with the formatting information.
 
accnit Jai
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Ulf , Sorry for confusion. what I meant that, I do not want to use any paid software. I am looking for any open source java API. I wrote the program by using the iText, but it just extract text from PDF.
 
Ulf Dittmer
Rancher
Posts: 42975
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As I said, I'm unaware of any free tool that extracts layout information from PDFs. If you are prepared to put a lot of work into it, you can go the route I suggested with the PDFRenderer source code.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!