I have a program which parses files in a directory. The files are supposed to be UTF-8 encoded. However if by mistake a file has another encoding, it is not supposed to crash, but to ignore the file.
I have started out using Files.lines to read the lines. But this does not work as required.
A much simplified version:
The file is ANSI encoded. This code crahes, "End" is never printed.
Why is the exception not caught?
When I change the code to "pre-Java 8"
the program reads and prints all the lines without throwing an exception.
When I obtain the BufferedReader from Files, the program cannot read the file.
The exception is caught. "End" is printed.
Is there an explanation for the different behavior of the BufferedReaders?
Is there any way to make the first version work?
If you look at the first line of that stack trace it is actually throwing an UncheckedIOException, which is wrapping the root-cause MalformedInputException.
You'd have to catch the former to handle this.
I think the issue lies in the encoding used. The methods in Files use, if none is explicitly defined, UTF-8 encoding. The "old" I/O code uses the system default, which depends on your operating system but is hardly ever UTF-8.
Can you try the following to two bits of code and post the results?
The first uses your existing Files.lines code but uses the system charset instead of UTF-8.
The second uses your existing BufferedReader code but uses UTF-8 instead of the system charset. Note that I had to use FileInputStream + InputStreamReader instead of FileReader for that.