• Post Reply Bookmark Topic Watch Topic
  • New Topic

Special Character Handling on Unix with Java FileStreams  RSS feed

 
sandy gupta
Ranch Hand
Posts: 228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am having this really weird issue on Unix where some of the characters coming from Windows CMS (where else) are showing up as a ?
These characters are not normal in the sense that i can see that the bytecode assigned to these is really messed up.
Has anyone faced this issue earlier. If so, how do you go about solving this issue.

Here are the details.
I am using
h=(HttpURLConnection)webserver.openConnection();
h.connect();
//Get the input stream
in=h.getInputStream();

To get the input stream from a web file and then downloading it as an html file on the stream translates the weird character into a ?

Byte:83-->Char:S-->HexCode:3
Byte:121-->Char:y-->HexCode:9
Byte:110-->Char:n-->HexCode:E
Byte:100-->Char -->HexCode:4
Byte:114-->Char:r-->HexCode:2
Byte:111-->Char -->HexCode:F
Byte:109-->Char:m-->HexCode
Byte:101-->Char:e-->HexCode:5
Byte:32-->Char: -->HexCode:0
Byte:-106-->Char:?-->HexCode:6
Byte:32-->Char: -->HexCode:0
Byte:70-->Char:F-->HexCode:6
Byte:114-->Char:r-->HexCode:2
Byte:111-->Char -->HexCode:F
Byte:109-->Char:m-->HexCode
Byte:32-->Char: -->HexCode:0

Notice the byte value of -106 on that character. I have tried using readers setting the charset encoding to iso-8859-1 as well as utf8 but both gave the same results.

I am currently running out of options so please help.

Thanks
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does the page specify an encoding like:

[ October 13, 2005: Message edited by: Stefan Wagner ]
 
sandy gupta
Ranch Hand
Posts: 228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have tried doing that and it does not help
 
Harald Kirsch
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried the byte with code 150 (-106) with input encoding windows-1250. That makes it a kind of dash. I try to paste it here "", let's see if it survives.
[ October 14, 2005: Message edited by: Harald Kirsch ]
 
sandy gupta
Ranch Hand
Posts: 228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It does survive and it is a dash character. It shows up well on the windows 1252 encoding which your browser sets. But try changing the encoding on the browser to utf 8 and you will realize what i am talking bout.

BTW, I realized that the culprit was my java application which is using the US-ASCII charset on the unix box. I have tried everything from using readers instead of streams to using the string api to change the charset encoding but it seems like nothing is working.

If anyone has any idea of how you can force stuff in java to follow a particular encoding, please do let me know.
 
Paul Clapham
Sheriff
Posts: 22844
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If anyone has any idea of how you can force stuff in java to follow a particular encoding, please do let me know.


Sure. To force Java to write out data in a particular encoding:To force Java to read in data using a particular encoding:Or do you have some other code that you would like to specify the character encoding for, where those don't apply? Post it in that case.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!