Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Convert PDF files to Tiff files

 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I need to convert multipage PDF files to Single page Tiff files.
Can I achieve this using the JAI library?
What other ways can i realise this?

Thanks & regards,
Anup
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The http://pdfbox.apache.org/ library can create PNGs from PDFs. Then you can use the javax.imageio.ImageIO class to convert those to TIFFs (after TIFF-enabling ImageIO).
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The PDFBOx seems to me more like a command based utility solution than an API based solution.
Is there an API based solution to realise this?
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The shell script is just a thin wrapper around the Java library. Look for the PDFToImage class in the PDFBox source code, and it should be clear how it works.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I get the following error when i use the PDFToImage class:
Throwable occurred: java.lang.NoClassDefFoundError: org.apache.fontbox.afm.FontMetric
at org.apache.pdfbox.pdmodel.font.PDFont.getAFM(PDFont.java:313)
at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidthFromAFMFile(PDFont.java:262)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.getFontWidth(PDSimpleFont.java:175)
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:323)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:106)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:698)
at org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:137)
at com.abnarmo.nl.scan.pdfconvertor.split.PDFToImage.main(PDFToImage.java:204)
Caused by: java.lang.ClassNotFoundException: org.apache.fontbox.afm.FontMetric
at java.net.URLClassLoader.findClass(URLClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java:643)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:300)
at java.lang.ClassLoader.loadClass(ClassLoader.java:609)
... 12 more

It seems that the class, org.apache.fontbox.afm.FontMetric is not present. When I check the fontbox-0.1.0.jar, i cannot find this class. Insted i find the class as org.fontbox.afm.FontMetric. (without the apache in it)
I had downloaded this jar from http://pdfbox.apache.org/ . Where can I find the correct fontbox.jar?

Thanks & regards,
Anup
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had downloaded this jar from http://pdfbox.apache.org/ . Where can I find the correct fontbox.jar?

At the same place. You may want to grab jempbox as well, just in case it's needed.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks got the correct version.
I would like to convert the PDF file to a tiff file. So as mentioned the first step is to convert it into PNG file and then using JAI convert the PNG into TIFF.

In the PDFBox, the PDFImageWriter calss is used to convert the PDF to the desired PNG file. However, the PNG file is created in the Filesystem.
As i need to convert this further to a TIFF file, I would like to know if it is possible to have a byte array of the PNG file without actually it being creatd in the FileSystem which I can use as an input for the ImageIO classes?
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... and then using JAI convert the PNG into TIFF.

You'd be using the TIFF-enabled ImageIO to create the TIFF. As I said, JAI is not involved.

In the PDFBox, the PDFImageWriter calss is used to convert the PDF to the desired PNG file. However, the PNG file is created in the Filesystem.

Is having an interim file an actual problem? ImageIO can read a PNG file in one line of code - it doesn't get much easier than that.

I would like to know if it is possible to have a byte array of the PNG file without actually it being creatd in the FileSystem which I can use as an input for the ImageIO classes?

It is possible, but you'll have to dig deeper into PDFBox and ImageIO. The solution would involve adapting PDFImageWriter.writeImage to obtain a MemoryCacheImageOutputStream in the ImageIO.createImageOutputStream call instead of a FileImageOutputStream. Unless you've ascertained (how?) that creating interim PNG files is an actual problem this is a lot of effort for little gain.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using RAD 7.5 version. It does not contain the class com.sun.media.imageioimpl.plugins.tiff.TIFFImageWriterSpi.
Where can I find the jar file wih the plugin for TIFF?
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The link "TIFF-enabling ImageIO" I posted earlier explains that.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am able to convert PNG format to TIFF, howerver the TIFF file created is large.
How can i compress the TIFF file?

Following is a snippet of my Code:
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
TIFFs will generally be larger than PNGs or JPEGs - they use less efficient compression. Is that an actual problem?

BTW, you shouldn't use PNGImageReaderSpi directly. This will do nicely (and work for all supported image formats, not just PNG):

 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok thanks for your inputs!
I can use the approach as mentioned in the discussion thread to convert PDF to PNG and then to TIFF.
I want to convert a multipage PDF to TIFF.
What would be the best apporach -> to split the final TIFF or to first split PDF and then create the individual TIFFs.
Is there a better approach?
Also, at the following link I found a different approach (using Jpedal)
Are there any differences/downsides of using it?
Link using Jpedal
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic