• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Encoding difficulty -> how/where do I set it?

 
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's kinda this again:
https://coderanch.com/t/374049/java/java/should-default-charset

Our webapp is behaving very oddly. The database has french characters. The webpages display the french. So jdbc and the db are behaving, and apparently so is our servlet container (Tomcat).

But emails... I get '?' for each non-english character.
Same with System.out.println or log4j.debug.

I ran this code in a JSP page, expecting "UTF8", but I got "ASCII" (platform is Slackware 9)

WtF ?
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I see you found the same workaround I did to determine the platform's default file encoding. Sad that there's no better way to do this (like a standard system property or something). Oh well. As to your question: as far as I know there's no reason to expect UTF-8 as the standard encoding on Java platforms. There are some specific places where Java standards mandate the use of UTF-8 or something very much like it - but there are many other places where the encoding is intentionally not mandated, instead deferring to the mysterious "platform default". The two places I know of where UTF-8 is required to be used are: (a) the class file format uses (modified) UTF-8 to represent all String literals, and (b) the DataInput and DataOutput interfaces (implemented by RandomAccessFile and others) provide methods to readUTF() and writeUTF(). (Again, with a slightly modified version of UTF-8).

In contrast, there are many more places that default encoding creeps in if you're not careful. Anything which implicitly converts between bytes and chars is suspect. Some key places such conversion happens:

new String(byte[] bytes)
new InputStreamReader(InputStream in)
new FileReader(String fileName)
new PrintStream(OutputStream out)
new Scanner(File file)

I don't know of any standard way to change the default encoding on a machine. (Though it's entirely possible there's something in your OS which allows this.) What you can do is replace the above constructors with an alternate form that explicitly specifies an encoding:

new String(byte[] bytes, String charset)
new InputStreamReader(InputStream in, String charset)
new InputStreamReader(new FileInputStream(String fileName), String charset)
new PrintWriter(new OutputStreamWriter(OutputStream out, String charset))
new Scanner(File file, String charset)

Hope that helps...
 
reply
    Bookmark Topic Watch Topic
  • New Topic