• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Knute Snortum
  • Bear Bibeault
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Tim Holloway

Handling languages other than English in Java ...........................

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,

I am reading several feeds which has titles/text in languages other than english like chinese, japan, Arabic etc..........

Example title in other language : 认清世界 读懂中国: 老百姓将杨佳案矛头指向沪政法书记吴志明

when i read such string in java program and display it iam getting all question marks instead of language specific characters.

as below : ???(?)

Can any one guide on how to resolve this language specific issue ?

--rama
 
Sheriff
Posts: 14691
16
Eclipse IDE VI Editor Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

and display it


Where are you displaying it ? If you're displaying it in the console, you're environment might not support the fonts necessary to display such languages.
 
Rama Vadakattu
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Instead of displaying iam storing those characters in a Mysql database still those language specific characters are appearing as question marks.
 
Rama Vadakattu
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ANy clue on how to resolve this.
 
Ranch Hand
Posts: 102
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Might be a problem of character sets configured on the database server.
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
...or it might be an issue with the capabilities of whatever client your use to view the data.

Have a read of this very good article.
 
Marshal
Posts: 24585
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And this one.
 
Rama Vadakattu
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
THanks all i have resolved the problem.

The below links explains you what is the problem and how to solve it clearly.
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

Problem :

The characters which are in the feed are UTF-8 encoded characters ,
where as by default the tomcat server assume that all the characters are encoded in ISO-8859-1

as the result Tomcat is trying to read the characters in the feed (which are UTF-8 encoded) in ISO-8859-1 encoded format because of which it could not able to print the international character's.

How to resolve?
~~~~~~~~~~~~~~~
we need to say to the java servlet that those characters are UTF-8 and are not the default ISO encoded

How to say?
~~~~~~~~~~
URL ffeedurl = new URL(feedurl);
HttpURLConnection.setFollowRedirects(true);
URLConnection connection = ffeedurl.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;

Please observe the below line the second argument of InputStreamReader constructor ....................it is UTF-8
which say to the servlet that characters retrieved from the URL are UTF-8 Encoded are not encoded in the default ISO format

InputStreamReader defaultReader = new InputStreamReader(httpConnection.getInputStream(),"UTF-8");

That's it. in adddition to that you need to take care of the below things.

1) mysql connection should be as below
jdbc:mysql://localhost/databasename?useEncoding=true&characterEncoding=UTF-8
instead of
jdbc:mysql://localhost/databasename

2) in mysql database , each table , each text/varchar column should be of UTF-8-general-ci

3) if you are using log4j and want ot see the UTF-8 characters in the log messages you should add the below param to each appender
<param name="Encoding" value="UTF-8"/> (i don't know even after setting this i couldnot able to see the characters properly in log file/console)

4 ) important links which talks about this problem and solution:
http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/ (to clearly undestand what the problem is and how to resolve)
http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with
http://stackoverflow.com/questions/138948/how-to-get-utf-8-working-in-java-webapps

--rama
 
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Great! and thanks for posting the solution
 
The human mind is a dangerous plaything. This tiny ad is pretty safe:
how do I do my own kindle-like thing - without amazon
https://coderanch.com/t/711421/engineering/kindle-amazon
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!