• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

TIFF. DOC, EXCEL to PDF Converter

 
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I need to write an application that converts a TIFF image/Word document/Excel to a PDF document and stroes it in the database.
It should be possible to later create a single TIFF images from this PDF document.

Can anyone please suggest APIs I could use to realise the above?
Can all this be achieved using the Java Advanced Imaging API (JAI)?

Thanks in advance,
Anup
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Creating a PDF that contains nothing but an image is quite easy using the iText library; its web site has an example that shows how to do that.

Converting Excel files is not hard; the Apache POI library can be used for reading the Excel file, and then again the iText library can be used for creating PDFs that contain tables.

Word can be dealt with in a similar manner (POI also supports it), but it'll be quite a bit tricker, especially if the file contains tables and images, since the POI API for handling DOC/DOCX isn't as advanced as the one handling XLS/XLSX, and of course Word files have a less regular structure than Excel files.

JAI won't be of any help with this.

There are commercial packages available that can be used from Java applications; you may want to investigate those before embarking on writing your own, especially if you need to deal with complex documents - writing your own converter that handles those and generates good quality output could easily take a couple of weeks (or a month) of your time.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ulf,

Thank you for the information.
I tried the example to convert a tif file into PDF.
However the generated PDF file contains only a part of the Tif file.
Are there any specific parameters that we need to take care of inroder to preserve the tif image information.

Following is the snippet of code:
import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

public class TiffToPDFConversion {

public static void main(String[] args) {

System.out.println("Images");
// step 1: creation of a document-object
Document document = new Document();
try {
// step 2:
// we create a writer that listens to the document
// and directs a PDF-stream to a file
PdfWriter.getInstance(document, new FileOutputStream("D:\\!Anup\\Project\\iText\\Temp\\Images.pdf"));
// step 3: we open the document
document.open();
// step 4:
document.add(new Paragraph("iText.tif"));
Image tiff = Image.getInstance("D:\\!Anup\\Project\\iText\\Temp\\iText.tif");
document.add(tiff);

} catch (DocumentException de) {

System.err.println(de.getMessage());

}
catch (IOException ioe) {

System.err.println(ioe.getMessage());

}
// step 5: we close the document
document.close();
}
}
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is the page large enough to hold the image? If not you'll need to scale it down.

The image's getWidth, getHeight, getDpiX and getDpiY methods will be helpful in figuring this out.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The Tif contains a logo on the upper left hand corener. Only this logo is written in the PDF. The complete data other than the logo is missing.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
On printing the values of the image following is what is shown:
System.out.println(tiff.getAbsoluteX());
System.out.println(tiff.getAbsoluteY());
System.out.println(tiff.getAlignment());
System.out.println(tiff.getDpiX());
System.out.println(tiff.getDpiY());
System.out.println(tiff.getHeight());
System.out.println(tiff.getWidth());
Output:
NaN
NaN
0
200
200
2309.0
1632.0

This does not make much sense to me. I feel that while reading the Tif, not the complete image is read but only a part of it which is being converted to PDF.
Is there a way to influence this reading?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, *is* the image 2309x1632 pixels large? If so, then it's probably being read correctly. It's possible that iText doesn't honor the DPI setting, in which case you'd need to do the scaling yourself - play around with the Image.scalePercent method to see if that helps.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks. The scale percentage works. But how can I determine the right percentage so that is works for every tiff image?
Is it possible to determine it based on the some calculation from values fetched from the image itself?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, if iText really uses 72 DPI internally no matter what (and you should make sure that's what it's doing), then you can calculate how big the image would be since you know its pixel dimensions. You also know how big the page is (based on 72 DPI), so you can perform some math with that.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Ulf!

It seems that by default it is indeed 72 DPI for iText.
Checked the folloinwg site: http://www.mail-archive.com/[email protected]/msg44706.html
Also on testing it I found out that using the scaling percentage as: 72/200 * 100 = 36, my Tif perfectlly fits in the PDF.
(200 being the DPI for my Tif image)
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The image conversion worked as per your suggestions.'
I am however confused on how to do the same for .doc/.docx/.xls/.xlsx files.

In one of your posts you have mentioned:
"I'd probably create the PDF at the same as the XLS file, using the iText API. Or, if it's not feasible to do it at the same time, use POI to open it later, and then use iText to create the PDF.
"
https://coderanch.com/t/420976/Other-Open-Source-Projects/Java-API-convert-Excel-PDF


I could not locate any method in POI that reads the doc/x, xls/x files in one go and the output of whcih could be direclty fed to an iText method to get the PDF.
Data from doc or excel can be extracted part by part and fed to iText for PDF creation. However, the entire formatting is lost.

Is it actually possible to use POI with iText to convert doc/x, xls/x to PDF.??


 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I could not locate any method in POI that reads the doc/x, xls/x files in one go and the output of whcih could be direclty fed to an iText method to get the PDF.


You're right, there's no such method - you'll have to code that yourself. For XLS/X document you'd use POI to read the cell contents and formatting, and create PDF tables as appropriate using the iText API. Same for DOC/X, except that the range of possible inputs in a text document is much wider (text, images, tables, ...) and consequently the code will be more complicated than for spreadsheets documents. My first post in this topic talks about this.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This would really make the entire processing really heavy.
We would be receiivng thousands of documents in DOC/X, XLS/X formats to be converted to PDF.
Is there not any other API that I could use? OpenOffice/JODConverter etc?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't understand what you mean by "really heavy".

JODConverter is certainly an option if you can require OO to be installed, and the resulting documents properly reflect the input documents.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Normally the formats of these documents being scanned (i.e if it is an excel file, then each excel file is not of the same structure) is not the same. Every document or excel can structurally differ.
So will using the POI-iTEXT combination be limited to certain structures only (for which I specifially code) or can it be done generally for all?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since there's no ready-made solution using POI/iText, whatever solution you come up with will support exactly those features that you care to implement (which probably will be just those features that the documents you're dealing with are using).

If you're looking for general-purpose solutions (and OO/JODConverter doesn't cut it) then you're probably better off buying a commercial package.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Could you direct me to some example where I could use POI and iText to convert Word/Excel to PDF preseerving the format of the intial document?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Both the iText and POI web sites have plenty of examples on how to use them; beyond that it's a matter of searching the javadocs for methods/classes that accomplish the rest. If you're serious about using iText I strongly recommend getting the book "iText in Action"; it'll save you a lot of time figuring out stuff.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried to generate a PDF document from a WORD doc but I get the following Exception:
ExceptionConverter: java.io.IOException: No message found for the.document.has.no.pages
at com.itextpdf.text.pdf.PdfPages.writePageTree(PdfPages.java:113)
at com.itextpdf.text.pdf.PdfWriter.close(PdfWriter.java:1171)
at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:780)
at com.itextpdf.text.Document.close(Document.java:409)
at com.abnamro.nl.scan.pdfconvert.process.MSWordToPDFConversion.convert(MSWordToPDFConversion.java:51)
at com.abnamro.nl.scan.pdfconvert.process.MSWordToPDFConversion.main(MSWordToPDFConversion.java:61)

Following is a snippet of the code:


What is causing this issue. I even set the PDF writer to accespt balnk pages but still this issue occurs.
The AFM file for the font being used in the DOC file is also present in the jar.




 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please edit your post to UseCodeTags. It's unnecessarily hard to read the code as it is, making it less likely that people will bother to do so.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Added code tags. I hope it is readable now. Any idea why the error is thrown?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are any paragraphs being added? I take it there's no exception?

One thing I'd try is to use iText 2.1 instead of iText 5.
 
Anup Bansal
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This was a configuration issue. (Not exactly sure what specific configuration)
I re-created the workspace and ran the code. It worked.
 
We should throw him a surprise party. It will cheer him up. We can use this tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic