Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading Polish Characters from URL  RSS feed

 
Chris Mack
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using an HttpURLConnection to request a page from a server that has polish text in it.
For example, the page has Sprawdzenie nośności z opiekunem but when I print out the response to the console, I get Sprawdzenie no?no?ci z opiekunem.

This is how I am making the request:



The response page is encoded with the polish charset ISO-8859-2. This is how I am reading the response:




Any help or suggestions would be greatly appreciated.
Please let me know if you need any more information (chris.mack@centimark.com)

Also, I have tried using the java.nio.charset.CharsetDecoder to decode the page. I read the stream in as bytes and placed the bytes into a ByteBuffer, which didn't work.

Thanks,

Chris

 
Carey Evans
Ranch Hand
Posts: 225
Debian Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You may be reading the data correctly (although you could just write new InputStreamReader(istream,"ISO-8859-2")), but not getting it to display on the console. Try displaying the data you receive with the GUI:
 
Chris Mack
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would just like to say thanks you for your reply... I appreciate it.

I am using RAD as my IDE and I am putting a break point in the code right before the string buffer is being printed to the console. When the code stops running at my break point I check the contents os the string buffer. It also has the ? in the polish text. I suspect that when the text from the in.readLine() method is assigned to the String inputLine, the text is being converted to UTF-8 instead of maintaining the charset encoding.

Any other suggestions?

Thanks again,

Chris
 
Carey Evans
Ranch Hand
Posts: 225
Debian Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There won’t be any character conversion happening when assigning strings, since Java only copies a reference to the String object, and all Java strings are encoded in UTF-16 anyway. The InputStreamReader does the initial conversion from ISO-8859-2 to UTF-16, and the System.out.println() converts from UTF-16 back to the encoding in the file.encoding system property.

I wrote a short test program, and I can’t reproduce your problem. Can you see whether this works for you?

In this case, I get ? instead of ś and ż on my console, because its encoding is Cp1252, but JOptionPane displays the string correctly.
 
Chris Mack
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was able to get the characters to display in my RAD console by changing the JVM encoding to UTF-8 and changing the console font to a font that supports UTF-8 charset.

Thanks for your replies, much appreciated!

 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!