This week's book giveaway is in the Java in General forum.
We're giving away four copies of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 and have ishori Sharan & Adam L Davis on-line!
See this thread for details.
Win a copy of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

Encoding norwegian characters as UTF-8

 
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

I am trying to encode a String with the content, "Gratulerer! Du har n \u00e5" into norwegian, the \u00e5 which should be replaced by a "å" . I have tried the following,

public static String doEncode(String text) throws IOException{
return new String(text.getBytes(),"ISO-8859-1");
}


public static String doEncode(String text) throws IOException{
CharsetEncoder charSetEncode = Charset.forName("ISO-8859-1").newEncoder();
charSetEncode.reset();
ByteBuffer buffer = ByteBuffer.allocate(text.length());
charSetEncode.encode(CharBuffer.wrap(text.toCharArray()), buffer, true);
return new String(buffer.array());
}

both the above methods return the following, "Gratulerer! Du har n å"

If i replaced ISO-8859-1 with UTF8, I get "Gratulerer! Du har n å"

I run the program with a -Dfile.encoding=UTF8, jvm option so as to emulate the default encoding of glassfish. if the same option were set to use ISO8859_1, i get the expected behavior, but since UTF8 is a superset of ISO8859_1, I was hoping to get the same result. I am not permitted to change the encoding of glassfish, since certain other things on the UI get falsely displayed.

thanks,

vijai
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Strings in Java are encoded as UTF-16. Always always always UTF-16. You cannot convert a String to a different encoding since they are always encoded as UTF-16.

Your code


return new String(text.getBytes(),"ISO-8859-1")


says take the String referenced by text and convert to bytes using your default encoding. Then, assume that those bytes are ISO-8859-1 and convet back to a String . If your default encoding is ISO-8859-1 and there are no characters in your string that cannot be represented in ISO-8859-1 then your new string will be exactly the same as the original - i.e. you have a null operation. If your default character encoding is not ISO-8859-1 then you will possibly (in your case certainly) corrupt the string.

You second approach has a similar problem.

If you just want the bytes of the utf-8 encoding then just use

byte[] utfBytesOfMyString = "my string".getBytes("utf-8");
 
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, UTF-8 is not a superset of ISO-8859-1.
 
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic