A quick look at the ASCII character set shows it only uses the first 127 byte values. You could read with a stream and see if you find any bytes with value over 127. That would make them negative in Java, wouldn't it.
Uh oh, the "extended ascii" set uses all 255 values. With that, you're out of luck.
Can you define "binary" and "ASCII" any better ... what kinds of files are you likely to run into?
It's hard to give a useful answer without more information than you've given here, but the problem you raise is a general one: how does one tell by looking at the contents of a file what kind of file it is?
The short answer is: it can't be done in general. You can use various heuristic tricks and make educated guesses. (For example, compressed files typically come with specific bytes at the very beginning; I'm told that viruses and worms tend to have signatures that betray them; and so on.) But there isn't a general-purpose algorithm for looking at a stream of bytes and saying, "Yes, that's an executable" or "No, that's just an email message." So you won't find a class in the Java API that enables you to do this.
I would focus on the characters lower than 32. Every character except '\t', '\n' and '\r' would indicate a binary file in my eyes, but as mentioned before, the term 'binary' isn't so clearly defined - at least not to me.
By binary I mean, non-human readable. Basically, I have a search/replace program that runs on a directory with deep nested tree structure (several files and directories). I want to skip binary (non textual files) so as to speed up the program.
Why don't you give it a file filter so that it knows what types of files (probably by file extension) to process?
James Carman, President<br />Carman Consulting, Inc.
posted 13 years ago
For a "guess" you might read a thousand bytes and see if they are all "printable" as defined by regular expressions. See Pattern in the JavaDoc for a start on that. That could give you some level of confidence short of absolute certainty that a human might be interested in the file's contents.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi