• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

UTF-8 problem again, but still little diffrent

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello for all,

I'm a newbie in Struts dev and just now I'm faced with a strange problem related to accented letters.
I read in a list of entries written in UTF-8 and fill a listbox with parsed entries on a jsp page.
Unfortunatelly the accented letters doesn't appear properly as they was written in file (csv format, parsed before filling).
The response is combined from a HtmlHead.jsp, xxx.jsp (let's say, a body) and a HtmlFoot.jsp. The HtmlHead.jsp contains
<?xml version="1.0" encoding="utf-8"?>
and in <head>:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

When I try to reach a copy of csv file outside of /WEB-INF through browser, it has the same problem with accented characters, but if it has been changed into html adding the <html>, <head> and <body> tags with modifications above, the letters are OK.
Any ideas where I missed and what should to do to get proper accented letters in listbox?
I get also a second problem regarding accented letters. Due to internationalisation I keep the texts appeared on page in a property file. The messages contains accented letters as well. In jsp pages I call them as, for example
<s:text name="properties.something123"/>
and it works well, EXCEPT the stuff in <head>:
<head>
....
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta name="description" content="<s:text name="properties.description"/>"/>
<meta name="keywords" content="<s:text name="properties.keywords"/>"/>
<title><s:text name="properties.header"/></title>
....
</head>

Any idea what went wrong?

Thanks in advance, krnl
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"No Matter", please check your private messages for an important administrative matter.
 
Misi Nyilas
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Nobody have any idea?
 
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I can't give you a precise answer, since I don't know all the factors, though I will give you some stuff you should check:

- Try opening your file in an editor as UTF-8, to make sure it is really UTF-8.
- Make sure you are reading the file as UTF-8. If you are using InputStreamReader then you should be specifying the encoding. Don't make any assumptions about what the OS defaults to. If your file is not UTF-8, then specify the character set that it is using in InputStreamReader.
- Make sure that the content-type in the response header is "text/html; charset=utf-8".

It should be noted that Java uses UTF-16 internally. Additionally if you are using Tomcat you should be adding a Charset filter, to handle POSTs receiving UTF-8 data: http://wiki.apache.org/tomcat/Tomcat/UTF-8
 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I have faced and solved this problem and it is a common issue in case of internationalization.
In this issue, the accented characters(like french character ) are not shown correctly on the web page. They look like empty boxes or special question mark characters.
The basic problem exists in reading these characters from properties file. We need to provide proper encoding. While working for internationalization scenario, InputStreamReader must be initiated with 'InputStreamReader(java.io.InputStream in, java.nio.charset.Charset cs)' constructor to provide proper encoding/characterset to be used while reading from the files.
We usually have perception that UTF is the solution, which is the default one but it is definitely not.For example for Western Europian Diacritics(languages) like french,dutch etc, 'ISO-8859-1' should be used and not UTF.
If the encoding/characterset which we are using to read from resource bundle do not have 'Glymph' of that particular character in it, which it come across while reading, a distorted character is returned/disaplayed on screen.
For further details about which encoding to be used for which language,have a look at following link:
Character_encoding
[ December 01, 2008: Message edited by: Arpit Purohit ]
 
André-John Mas
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Arpit: UTF-8 can certainly be used for representing European characters, amongst many others. I would go as far as to say that standardising your web site on UTF-8 will solve many problems in the long run, though you do need to understand limitations posed by conversion to systems which are assuming a specific writing script.

To do things correctly you should ensure UTF-8 end to end, since if any part of the process assumes a more restricted character set such as ISO-8859-1 or GB2312, then you risk losing information on the conversion if the character is not supported.

Java uses UTF-16 iternally, so no matter what you are doing some sort of conversion has already taken place when communicating outside of the VM.

It should be noted that if your content type is text/html then ISO-8859-1 is what is meant to be the default handling. If you wish to use an alternative character set then you need to specify it. For example:

text/html; charset=UTF-8
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic