Ok, I know this isn't entirely a
Java question but . . .
I have a Java program that, among other things, has to look at a file and determine if it's a PDF. I have, basically, a byte array to work with.
I know I can convert the byte array to a
String, and look at the first 5 characters for "%PDF-"
However, this apparently isn't a hard-and-fast rule. In fact, I'm seeing now some files coming into my program which have something like:
Now, the code I've got already obviously doesn't recognize this - yet Adobe Reader handles it just fine!
So, what's the "correct" way to see if a file that comes in as a .pdf is actually legitimately a .pdf that is readable by Adobe Reader? I think trying to look at the string and yank out all of what looks like HTML while NOT pulling out anything that is legitimately part of the PDF would be an exercise in self-induced insanity.
Is there some sort of library or class available that does this? Is the file I'm seeing, despite being readable in Adobe Reader, really an "improper" .pdf? When I right-click it in Windows, and choose Properties, then the PDF tab, it says it was created by Adobe Acrobat 6.0 and is PDF Version 1.5.
Any guidance would be appreciated.