• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

InputStreamReader is not properly reading some characters in Linux.

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I have a file that contains the text TEST NAÏVE SUBJECT I wrote a java program to read this file in RedHat Linux.
The java code that reads the file is similar to the below

File inputFile = new File(fileName.toString());
FileInputStream in = new FileInputStream(inputFile);
LineNumberReader lnr = new LineNumberReader(new InputStreamReader(in));
String streamInput = null;
while ((streamInput = lnr.readLine()) != null) {
System.out.println(streamInput);
}

The out put of the program is

TEST NA?VE SUBJECT

(observe that the character Ï is not read properly.)

I can able to guess that the java program is reading the input file in a different encoding than the file was actually encoded. If so what is the solution to overcome this problem?

Here I list the details that I observed on the Linux server:

The env variable LANG=C
While executing file command for that input file, it displays

ISO-8859 text, with CRLF line terminators


The same program is reading the characters properly, when I execute it in a different Linux server but with same Locale (LANG=C) settings.
But here the file encoding type was UTF-8 Unicode English text, with very long lines, with CRLF line terminators

Thank you!

Regards
Ganni
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just use the InputStreamReader constructor which accepts an encoding name. It looks like you know the actual encoding of the document so that shouldn't be a problem.
 
Ganni Kal
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Thanks for your suggestion.

I do not want to specify the encoding name inside the InputStreamReader constructor because, characters like Ï are exceptional case. Even I do not know how it was entered into the file (I have not created the file). Also I can not handle all other characters like this that comes from different encoding format.

I just want the read operation to be happened by the default encoding that was set in the OS or JVM.
This is working good in a different Linux server (with the same java code). Mainly I want to understand how the file with same characters are read properly in one Linux server but not in other, where both servers are having the same locale settings.

I know this may be a question related to Linux OS/JVM. But nowhere I can find the answer including the LinuxQuestions forum.

Any more ideas?

Thanks
Ganni
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I predict that if you print out Java's default file encoding on the two systems which work differently, you're going to print out different values.

And if you use the wrong encoding to read a file, you're going to have errors like that. But it sounds like you don't consider that a problem?
 
Ganni Kal
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


I executed the following code to list all the charsets supported by the JVM in both the RH servers.
There is no difference in the output.

Map charSetMap = Charset.availableCharsets();
Iterator itr1 = charSetMap.keySet().iterator();
while (itr1.hasNext()) {
Object key = itr1.next();
System.out.println(key + " - " + charSetMap.get(key));
}




Regards
Ganni
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ganni Kal wrote:I executed the following code to list all the charsets supported by the JVM in both the RH servers.
There is no difference in the output.



Okay. But that isn't useful information.

Paul Clapham wrote:Well, I predict that if you print out Java's default file encoding on the two systems which work differently, you're going to print out different values.



You didn't try that yet.
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ganni Kal wrote:(observe that the character Ï is not read properly.)


Are you sure? Are you sure it's simply not printed properly?

Most terminals, including Windows' CMD.EXE, simply cannot handle anything outside plain old ASCII. Try using JOptionPane.showMessage to show the message (if you have an Xorg session running that is).
 
Ganni Kal
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:

Ganni Kal wrote:(observe that the character Ï is not read properly.)


Are you sure? Are you sure it's simply not printed properly?

Most terminals, including Windows' CMD.EXE, simply cannot handle anything outside plain old ASCII. Try using JOptionPane.showMessage to show the message (if you have an Xorg session running that is).




Actually I use Swing based UI. The actual application code is storing the characters into database, and that data is read from DB, and displayed in a JTextField.
For debugging purpose only, I tried printing the characters in console.

The characters are not read properly both in console and Swing.


Thanks
Ganni
 
Ganni Kal
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Hi

I tried to read the same file by using the charset name ISO-8859-1 as an argument inside InputStreamReader constructor.
Now the non-printable characters are read and printed as they are.

LineNumberReader lnr = new LineNumberReader(new InputStreamReader(in, Class.forName("ISO-8859-1")));

I understand that the input file was with the ISO-8859 charset encoding, but the Java is trying to read the file using its default encoding format ANSI.
I use the Wepsphere's JVM for compiling and running the program.

Will Websphere affect the JVM's default encoding format?
How to change the JVM's default file encoding format?


Regards
Ganni
 
Ganni Kal
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


The same program is reading the characters properly, when I execute it in a different Linux server but with same Locale (LANG=C) settings.
But here the file encoding type was UTF-8 Unicode English text, with very long lines, with CRLF line terminators



While comparing the two server (Red Hat Enterprise Linux AS release 4) settings, the patch update version is different from each other.
One of the server has the patch version as Nahant Update 5 and the other has Nahant Update 7.

Will this difference cause this problem?
 
reply
    Bookmark Topic Watch Topic
  • New Topic