Hi
I have a file that contains the text
TEST NAÏVE SUBJECT I wrote a java program to read this file in RedHat Linux.
The java code that reads the file is similar to the below
File inputFile = new File(fileName.toString());
FileInputStream in = new FileInputStream(inputFile);
LineNumberReader lnr = new LineNumberReader(new InputStreamReader(in));
String streamInput = null;
while ((streamInput = lnr.readLine()) != null) {
System.out.println(streamInput);
}
The out put of the program is
TEST NA?VE SUBJECT
(observe that the character Ï is not read properly.)
I can able to guess that the java program is reading the input file in a different encoding than the file was actually encoded. If so what is the solution to overcome this problem?
Here I list the details that I observed on the Linux server:
The env variable LANG=C
While executing file command for that input file, it displays
ISO-8859 text, with CRLF line terminators
The same program is reading the characters properly, when I execute it in a different Linux server but with same Locale (LANG=C) settings.
But here the file encoding type was
UTF-8 Unicode English text, with very long lines, with CRLF line terminators
Thank you!
Regards
Ganni