• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Tim Cooke
Sheriffs:
  • Rob Spoor
  • Liutauras Vilda
  • paul wheaton
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Mikalai Zaikin
  • Carey Brown
  • Piet Souris
Bartenders:
  • Stephan van Hulst

International Characters

 
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a java application that processes a lot of text, Text representing data from many countries and languages.

I have to read in the data, process (web service calls) and write out the logs.

I changed the encoding-scheme while reading streams from default to use UTF-8 to support Chinese characters and it worked fine.

When i was using the default system encoding-scheme, the application supported charactes in German Language but when i made the change to use UTF-8, my application no longer supports German characters. It shows as ? and so on.

I can make the change to default again and process German characters but is there not a way to read a file, get its encoding format and configure the input stream reader to use this encoding format to read and configure log4j to use this format to write them out again.

Any pointers to the right direction is much appreciated.

Thanks
Kasi
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just to be clear, with the default encoding the code works with German text, and with UTF-8 it works with Chinese text? That sounds as if the code is not properly processing inputs that come in various encodings. You should never rely on the default encoding - the code should always be aware of what encoding any input is in, and act accordingly.

There is no easy way to determine the encoding of a file, but you can try http://jchardet.sourceforge.net/
 
Kasi Viswan
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is it safe to assume all input files are UTF-8.

I converted the ger file to UTF-8 encoding with Notepad++ and my application works now, so it supports ger and chi languages with UTF-8.
 
A feeble attempt to tell you about our stuff that makes us money
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic