• Post Reply Bookmark Topic Watch Topic
  • New Topic

Unable to get charset and InputStreamReader to work

 
Trond Myklebust
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to read a file coded in ISO-8859-1, but having some issues. No matter what I do, I am unable to read the file properly regarding special norwegian characters without setting either LANG in Linux console or -Dfile.encoding when starting the application.

Is there someone who can explain why reading the file doesnt work without setting LANG or file.encoding?
Code:
BufferedReader rd=new BufferedReader(new InputStreamReader(new FileInputStream("test.txt"),"ISO-8859-1"));
System.out.println(rd.readLine());
rd.close();

Output:
1. Not working, only getting ?
[et4378@pjokken ~]$ export LANG=en_EN.utf-8
[et4378@pjokken ~]$ java test
Property file.encoding == ANSI_X3.4-1968
Lillestr?m </Name1></Name><Street><Street1>?r?sen Stadion, ?r?ssvingen 2</Street1></Street><PostalIn

2. These works, but only because either LANG and/or file.encoding is set.
[et4378@pjokken ~]$ export LANG=no_NO.ISO-8859-1
[et4378@pjokken ~]$ java test
Property file.encoding == ISO-8859-1
Lillestr�m </Name1></Name><Street><Street1>�r�sen Stadion, �r�ssvingen 2</Street1></Street><PostalIn
[et4378@pjokken ~]$ export LANG=en_EN.utf-8
[et4378@pjokken ~]$ java -Dfile.encoding=iso8859-1 test
Property file.encoding == iso8859-1
Lillestr�m </Name1></Name><Street><Street1>�r�sen Stadion, �r�ssvingen 2</Street1></Street><PostalIn
[et4378@pjokken ~]$ java -Dfile.encoding=windows-1252 test
Property file.encoding == windows-1252
Lillestr�m </Name1></Name><Street><Street1>�r�sen Stadion, �r�ssvingen 2</Street1></Street><PostalIn
[ November 26, 2008: Message edited by: Trond Myklebust ]
 
Paul Clapham
Sheriff
Posts: 21886
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually the input is working correctly. But input is only half of what that code does. The problem is in the other half.

And please check your private messages for an important administrative message.
 
Trond Myklebust
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've checked the output. We process the input and write the output to a file using UTF-8. The outputfile is only correct for the examples in case 2, not case 1. (More correctly, the outputfile is sent using FTP to another server where the file is loaded into Oracle DB using SQL Loader, the resultet data in the database tables are only correct for the examples in case 2.)

new PrintStream(new FileOutputStream(file), false, "UTF-8");

My issue is quite similar to this one I think..but never found a proper explanation.
http://forums.sun.com/thread.jspa?forumID=16&threadID=433354
[ November 26, 2008: Message edited by: Trond Myklebust ]
 
Paul Clapham
Sheriff
Posts: 21886
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If things change when you're using different values of the file.encoding property then you're doing something which uses that property.

But as I mentioned, there are two parts to your code. Input and output. Your question assumed that the input was at fault. You should test input separately from output (or alternatively: don't trust your output as a test of the input). Read lines of the file and see if any of them contain the Å character. If yes, then it's an output problem. If no, then it's an input problem.

Edit: Actually there are four parts to your process: input, output, FTP, and SQL Loader. I doubt that FTP is causing the problem. I think it's not impossible that SQL Loader is causing the problem. At any rate you should test each part separately.
[ November 26, 2008: Message edited by: Paul Clapham ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!