• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Some (not all!) UTF8 Characters render as "?" only from UNIX server -- works perfectly local (Win)..

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm beating my head against the wall here, so I thought I'd ping you to see if you had some insight..

I didn't see a natural area to post this in, so let me know if I should move it somewhere else. My app is struts based, so....

I've localized my application - it's designed to run as UTF-8. I have bundles for 5 languages -- and the application works perfectly on my local windows box, but when I run it off the UNIX server, *some* of the UTF8 characters don't render (they render as ? marks). I'm positive that the character encoding is set correctly, and the browser is correctly detecting the encoding as UTF8. Again, this works 100% on my local box, but not on the server.

I've been eliminating variables as best I can, and now I'm down to something at the OS level? It also doesn't add up that SOME UTF8 encoded text works (Spanish and French letters, which are encoded as UTF8) -- but Asian languages (Traditional Chinese and Japanese) do not...

Completely at a loss.

It's (probably) not a tomcat setting, since I'm running the same tomcat locally and it works... Grrrr... Could the property files be reading in differently on UNIX vs. Windows? Java .property files are supposed to be ISO8895-1 encoded - and when I look at it on the UNIX file-system it looks OK...

Any ideas?

If you have an insight I'd appreciate it. I'm happy to provide access to the page I'm working on, just didn't want it posted in a public place (yet )

Thanks in advance!

Dan

- edit -

A little more information, in case it helps:

1) My dev environment: eclipse 3.5 on windows vista 64
2) All of the property files are using the /uxxxx format to encode the text. I know it's supposed to be an ISO8859-1 file with these /uxxxx characters.

A sample of the bundles:

myBundle_en.properties:
address.city=City
address.country=Country
address.state=State

myBundle_es.properties:
address.city=Ciudad
address.country=Pa\u00eds <--- THIS WORKS EVEN ON UNIX. Odd....
address.state=Estado

myBundle_ja.properties:
address.city=\u753a <--- Renders as ?
address.country=\u56fd <--- Renders as ?
address.state=\u72b6\u614b <--- Renders as ??


 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When something renders as ?, that means that it went through an encoding which couldn't encode it.

Normally there's two steps between a properties file and what you are looking at: (1) read the data into memory from the properties file which as you know uses ISO-8859-1 plus some special handling, and (2) ... well, you didn't say how you are looking at that data.

You seem to have dealt with (1) correctly. So the problem would most likely be with (2) ... whatever it is.
 
Dan Cane
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:When something renders as ?, that means that it went through an encoding which couldn't encode it.

Normally there's two steps between a properties file and what you are looking at: (1) read the data into memory from the properties file which as you know uses ISO-8859-1 plus some special handling, and (2) ... well, you didn't say how you are looking at that data.

You seem to have dealt with (1) correctly. So the problem would most likely be with (2) ... whatever it is.



Paul,

Ahh - you might need that detail :)

I'm looking at it in a browser. I've checked IE8 and FF - and the page encoded is being picked up correctly as UTF8. I'm super stumped as the "lower" UTF characters work (e.g. a Spanish "n"), but anything "higher" doesn't.... Why it works wehn served from Windows, but not UNIX -- a clue - but as to what I have no idea...

The URL to take a peek at this is pbd.modernizingderm.com/xmd/ (just plain ol http, i just didnt want spiders crawling it yet)

As you can see, latin chars OK - UTF8 = check, but asian chars are a no-go... :(

Dan
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And presumably there's some code which writes that data to the output stream of a request? That's where I would be looking. You might also find some enlightenment from reading Character Conversions from Browser to Database, even though there isn't a database involved and the data is going to the browser, not from it.
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Dan Cane wrote:The URL to take a peek at this is pbd.modernizingderm.com/xmd/


And when I look at the headers sent to Firefox, I see things like this:

Content-Type: text/html;charset=ISO-8859-1


This was when I clicked on the "French" link at the bottom. When I click on the "Japanese" link I see this:

Content-Type: text/html


So there's more to this content-type business than you thought. You might try to find out why this happens.
 
Dan Cane
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:
And when I look at the headers sent to Firefox, I see things like this:

Content-Type: text/html;charset=ISO-8859-1




Where are you able to see that? I'm looking at firebug's request of the strut and here is what I have:

Response Headers
Date Tue, 24 Nov 2009 13:37:23 GMT
Content-Type text/html
Content-Language zh
Vary Accept-Encoding
Content-Encoding gzip
Content-Length 2183
Keep-Alive timeout=15, max=100
Connection Keep-Alive

Request Headers
Host pbd.modernizingderm.com
User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Connection keep-alive
Referer http://pbd.modernizingderm.com/xmd/app/FirmHome.action?locale=en
Cookie JSESSIONID=028CC2436C8702324EA9E90DE70B8588
Cache-Control max-age=0

I see the -Accept=Charset" having both ISO and utf-8 (what my local browser can support) , but the content-type only has text/html... I think you've found the issue, but I'm trying to figure out how to reproduce that (so i can fix it. :) )

FWIW: All pages have the meta-tag: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
and my taglib for the "header" (used on all pages) contains <%@ page contentType="text/html; charset=UTF-8" language="java" %>

Thanks!
 
Dan Cane
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Solved!

PEBAKC

(Problem Exists Between Keyboard and Chair)

Well sorta.. The server I was publishing on had #%@ SET JAVA_OPTS -Djavax.servlet.request.encoding="ISO-8859-1" in the shell script, changing the default to ISO in places that I didn't "force" UTF8.... I looked everywhere in the environment, but didnt think to look at the tomcat startup script...

Problem solved!!!

The site looks cool in Chinese

Thanks for all of your help!

Dan
 
Morning came much too soon and it brought along a friend named Margarita Hangover, and a tiny ad.
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic