Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to convert HTML to PDF?

 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello!


I want to convert, for example, http://www.google.com to PDF.
I found iText solutions, but i still haven't made them to work. Any experience with this? Any other suggestion?

All useful hints are welcome!


Regards
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you mean for arbitrary HTML? That's tough. Any solution would probably start by changing the HTML into well-formed XML (using a library like TagSoup of NekoHTML), and then parsing that XML and creating the PDF as appropriate.

If it was CSS-styled XHTML you could use https://xhtmlrenderer.dev.java.net/
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for the reply, Ulf!


I mean conversion for generic HTML. I suppose that I'm not the first who needs this...
Any idea/solution?


Regards
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I found:
http://html-to-pdf-converter-free.software.informer.com/

How can I create application like this?


Regards

 
Joe Ess
Bartender
Posts: 9312
10
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I used the Open Office Java API to open HTML documents then export them as PDF's.
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Joe!


I'm new to Open Office.
Can you post more details about setup and Java code for Pdf generation, please?
I posted this on Open Office forum too.


Regards
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using OO in server mode is decidedly non-trivial - the API has a steep learning curve. But I believe that the JODConverter library makes the process significantly easier, so you may want to look into that first.
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I found JODConverter on:
http://sourceforge.net/projects/jodconverter/
I started successfully the version for Tomcat.

It doesn't support html extensions? Is there any useful hint/code for reworking JODConverter to support converting html pages to pdf? I'd pass the link, and expect page as a pdf document.

Regards
 
Darya Akbari
Ranch Hand
Posts: 1855
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
iText is definitely the wrong API for what you want. Have you heard of DocBook XML and DocBook XSL? Give it a try, it not only converts html but a lot more.
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Darya!


I haven't worked with DocBook XML and DocBook XSL yet.
I'm looking after examples.

JODConverter is working on Tomcat. That is exactly what I need.
If you have any hint that will speed-up my digging, post it please.


Regards





 
Darya Akbari
Ranch Hand
Posts: 1855
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
DocBook XML and DocBook XSL allows you tech writing stuff like writing technical documents. When you have an HTML document you can transform this into DocBook XML and from there to PDF.
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How can i assemble working example with DocBook XML and DocBook XSL?

In the meanwhile I found class:
org.w3c.tidy.Tidy
http://jtidy.sourceforge.net/apidocs/org/w3c/tidy/Tidy.html
Tidy is based on XHTML. I set it up, but it's not working for all websites...
Any experience with this approach?
Any other way to create a generic HTML(url) to PDF, convertor?


Regards


 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
http://www.pdfonfly.com
is working for all the websites that i tried so far.

How to build this in Java?


Regards
 
Darya Akbari
Ranch Hand
Posts: 1855
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Imre Tokai wrote:How can i assemble working example with DocBook XML and DocBook XSL?


http://www.docbook.org/ has everything you need to know.
 
Ulf Dittmer
Rancher
Posts: 42968
73
 
Imre Tokai
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another posting on other forum:

http://forums.sun.com/thread.jspa?threadID=5374819&tstart=0


Regards
 
Mozellam Babich
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey Imre Tokai....here are some tools you can use to convert HTML file to PDF file.

HTML to PDF Converter

*PDFonFly
A free online converter that will take any URL of a web page that is live on the web (without a password in front of it) and convert it to a PDF file.

*PDFCrowd
This is a free online converter that will take a URL, an HTML file, or direct HTML input and convert it to a PDF file that is downloaded to your computer. It adds a footer to each page with a logo and advertisement.

*Total HTML Converter
This is a Windows program that you can use to convert web pages by URL or batches of HTML documents on the command line to PDF.

*Click to Convert
This is a Windows program you can use to convert HTML to PDF or PDF to HTML

I hope this will be usable for you. if you want to know more about HTML to pdf convertors, you can navigate to the following sites:

http://www.html2pdfrocket.com/
http://betanews.com/2014/03/28/html-to-pdf-tools-a-different-kind-of-pdf-converter/
http://www.evopdf.com/


 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic