Win a copy of Get Programming with Java (MEAP only) this week in the Beginning Java forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Bear Bibeault
  • Knute Snortum
  • Liutauras Vilda
Sheriffs:
  • Tim Cooke
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Tim Moores
  • Frits Walraven
  • Ron McLeod
  • Ganesh Patekar
  • salvin francis
Bartenders:
  • Tim Holloway
  • Carey Brown
  • Stephan van Hulst

Reading PDF text with font styles  RSS feed

 
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am working on a project, where I need to convert PDF to XML & XSLT.

I am able to extract text from PDF but not able to read layout and formatting information of the text and paragraphs. Meaning I want to read font size, font name, style, color and other formatting stuff of the text/paragraph.

I have tried using iText & PDFBox but not able to derive a solution.

Any help in this regard is highly appreciated.
 
Rancher
Posts: 42974
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
iText and PDFBox can't do that.

The http://java.net/projects/pdf-renderer/ library can display PDFs, so it includes code that extracts layout information from PDFs; you can try to find the bits and pieces that are of interest to you in that.
 
Sanjoo Singh
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank for the help. I am going through it and will update you soon.
 
Sanjoo Singh
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The PDF-renderer project is pretty big and it is consuming considerable amount of time in analysis (cont..).

Is there any other light weight library (jar) available to achieve the same result in a short span of time?
 
Ulf Dittmer
Rancher
Posts: 42974
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No no-commercial ones.
 
Sanjoo Singh
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. I have got a light weight library and i.e. JPOD PDF library. This is a very small library and provides me everything what I wanted to extract from PDF
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to do the same thing. Could you share your code, or at least provide an example of how to get started?
Thanks for your time.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi.
I am working on my final year project and I am getting the same problem. If you could please share your code snippet, it will be really helpful.
Thank you !!
 
Where all the women are strong, all the men are good looking and all the tiny ads are above average:
RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database
https://coderanch.com/t/704633/RavenDB-Open-Source-NoSQL-Database
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!