• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Encoding norwegian characters as UTF-8

 
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

I am trying to encode a String with the content, "Gratulerer! Du har n \u00e5" into norwegian, the \u00e5 which should be replaced by a "å" . I have tried the following,

public static String doEncode(String text) throws IOException{
return new String(text.getBytes(),"ISO-8859-1");
}


public static String doEncode(String text) throws IOException{
CharsetEncoder charSetEncode = Charset.forName("ISO-8859-1").newEncoder();
charSetEncode.reset();
ByteBuffer buffer = ByteBuffer.allocate(text.length());
charSetEncode.encode(CharBuffer.wrap(text.toCharArray()), buffer, true);
return new String(buffer.array());
}

both the above methods return the following, "Gratulerer! Du har n å"

If i replaced ISO-8859-1 with UTF8, I get "Gratulerer! Du har n å"

I run the program with a -Dfile.encoding=UTF8, jvm option so as to emulate the default encoding of glassfish. if the same option were set to use ISO8859_1, i get the expected behavior, but since UTF8 is a superset of ISO8859_1, I was hoping to get the same result. I am not permitted to change the encoding of glassfish, since certain other things on the UI get falsely displayed.

thanks,

vijai
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Strings in Java are encoded as UTF-16. Always always always UTF-16. You cannot convert a String to a different encoding since they are always encoded as UTF-16.

Your code


return new String(text.getBytes(),"ISO-8859-1")


says take the String referenced by text and convert to bytes using your default encoding. Then, assume that those bytes are ISO-8859-1 and convet back to a String . If your default encoding is ISO-8859-1 and there are no characters in your string that cannot be represented in ISO-8859-1 then your new string will be exactly the same as the original - i.e. you have a null operation. If your default character encoding is not ISO-8859-1 then you will possibly (in your case certainly) corrupt the string.

You second approach has a similar problem.

If you just want the bytes of the utf-8 encoding then just use

byte[] utfBytesOfMyString = "my string".getBytes("utf-8");
 
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, UTF-8 is not a superset of ISO-8859-1.
 
It looks like it's time for me to write you a reality check! Or maybe a tiny ad!
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic