• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

PDF writing woes

 
Dave T Taylor
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I'm trying to write a simple program that will read and write PDF documents - however, I'm having a few problems with the code.

My program seems to read the document fine, but will not write it out again properly. Comparing the two files side by side before and after writing, it appears there's a small number of control characters missing from the output file. Any clues as to why this is happening?


import java.io.*;
import java.awt.*;
import java.applet.*;

public class readFile
{
public static void main(String[] args)
{
try
{
String line;

// open the input stream
FileInputStream fis = new FileInputStream("c:/mypdf.pdf");
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream data = new DataInputStream(bis);

// open the output stream
FileOutputStream fos = new FileOutputStream("c:/mynewpdf.pdf");
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos);

System.out.println("Reading data...");

// read the input file
while ((line = data.readLine()) != null)
{
// write the output file
dos.writeBytes(line);
}

System.out.println("OK, done...");

}
catch (Exception e)
{
System.err.println(e);
}
}
}


Any help is much appreciated!!

Many thanks,

Dave.
 
Joe Ess
Bartender
Posts: 9340
10
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the JavaRanch, Dave.
Have a look at the Java documentation and you'll see this:

public final String readLine() throws IOException

Deprecated. This method does not properly convert bytes to characters.

Java API Documentation - java.io.DataInputStream

What's more is you can't treat a binary file like a PDF like a plain text file. A PDF file doesn't have "lines". It has some text data, but it also contains a ton of other binary data to describe what to do with that text. If you try to read the binary data in as text, Java tries to make it conform to a Unicode character set. Since the binary values can be outside the range of a particular character set, you'll lose information.
 
Dave T Taylor
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, Joe.

Yes, I've noticed that myself now. I've now converted the program to read the files on a character by character basis, and while it's converting a lot more of the characters properly, there's still certain ones that are getting changed.

I'm having to go through my output files with a hex editor and fine tooth-comb to find exactly where it's going wrong.

Thanks for the help.

Dave
 
Paul Clapham
Sheriff
Posts: 21588
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You just want to copy the file from one place to another? Then do not read the files one character at a time. What Joe said (about binary data versus Unicode characters) still applies no matter how many characters at a time you read. To copy any file, PDF or otherwise, just read bytes (not characters) from the input and write them to the output.
 
Dave T Taylor
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, thanks. That's what I'm doing now!


Dave
 
Jason Moors
Ranch Hand
Posts: 188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Dave,

There is an open source library called iText which enables you to create, manipulate and also copy PDF files. It maybe overkill for what you are trying to perform, but it's worth knowing about as it enables you to copy only certain pages etc.

http://www.lowagie.com/iText/
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic