• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

Special Character Handling on Unix with Java FileStreams

 
Ranch Hand
Posts: 228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am having this really weird issue on Unix where some of the characters coming from Windows CMS (where else) are showing up as a ?
These characters are not normal in the sense that i can see that the bytecode assigned to these is really messed up.
Has anyone faced this issue earlier. If so, how do you go about solving this issue.

Here are the details.
I am using
h=(HttpURLConnection)webserver.openConnection();
h.connect();
//Get the input stream
in=h.getInputStream();

To get the input stream from a web file and then downloading it as an html file on the stream translates the weird character into a ?

Byte:83-->Char:S-->HexCode:3
Byte:121-->Char:y-->HexCode:9
Byte:110-->Char:n-->HexCode:E
Byte:100-->Char -->HexCode:4
Byte:114-->Char:r-->HexCode:2
Byte:111-->Char -->HexCode:F
Byte:109-->Char:m-->HexCode
Byte:101-->Char:e-->HexCode:5
Byte:32-->Char: -->HexCode:0
Byte:-106-->Char:?-->HexCode:6
Byte:32-->Char: -->HexCode:0
Byte:70-->Char:F-->HexCode:6
Byte:114-->Char:r-->HexCode:2
Byte:111-->Char -->HexCode:F
Byte:109-->Char:m-->HexCode
Byte:32-->Char: -->HexCode:0

Notice the byte value of -106 on that character. I have tried using readers setting the charset encoding to iso-8859-1 as well as utf8 but both gave the same results.

I am currently running out of options so please help.

Thanks
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does the page specify an encoding like:

[ October 13, 2005: Message edited by: Stefan Wagner ]
 
sandy gupta
Ranch Hand
Posts: 228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have tried doing that and it does not help
 
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried the byte with code 150 (-106) with input encoding windows-1250. That makes it a kind of dash. I try to paste it here "", let's see if it survives.
[ October 14, 2005: Message edited by: Harald Kirsch ]
 
sandy gupta
Ranch Hand
Posts: 228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It does survive and it is a dash character. It shows up well on the windows 1252 encoding which your browser sets. But try changing the encoding on the browser to utf 8 and you will realize what i am talking bout.

BTW, I realized that the culprit was my java application which is using the US-ASCII charset on the unix box. I have tried everything from using readers instead of streams to using the string api to change the charset encoding but it seems like nothing is working.

If anyone has any idea of how you can force stuff in java to follow a particular encoding, please do let me know.
 
Marshal
Posts: 25811
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

If anyone has any idea of how you can force stuff in java to follow a particular encoding, please do let me know.



Sure. To force Java to write out data in a particular encoding:To force Java to read in data using a particular encoding:Or do you have some other code that you would like to specify the character encoding for, where those don't apply? Post it in that case.
 
Replace the word "snake" with "danger noodle" in all tiny ads.
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    Bookmark Topic Watch Topic
  • New Topic