I'm working on a JSP based webapp, but am having problems with understanding encodings. I've already tried getting help in the JSP forum and never got any bites, so here I am!
I made a simple war file (similiar to a jar, but its designed to be used with a webserver to just drop and go) which demonstrates output based on a users preferred language. The problem is that it doesn't always work and the web hasn't been very helpful so far.
For example, if you set your browser language to japanese and go to google, the html encoding will be UTF-8 and assuming you have the right fonts installed, you will see google in japanese. If you go to other japanese sites, they often list the encoding as shift-jis. Both work. I've also been to a page where the encoding is set to UTF-8, but the page has text in multiple languages.
I understand how to use i18n tags from JSTL, but I'm still missing something. I've encoded the properties files using native2ascii, which didn't work either.
How does a browser or any other app really 'know' what encoding to use. If I take a text file in japanese and encode it as UTF-8 and then take a french file and do the same, how does an app know that one encoding results in french not japanese?
I can take a text file save in japanese and drop it onto netscape (with the proper fonts installed of course) and it displays perfectly. But how did netscape 'figure' this out? Especially if the encoding was in the native format? When viewed with a regular text editor, you see the special chars as binary symbols, which makes sense to me.
Wow, you're missing a lot here. HTTP headers for one, which include things such as "Accept-Charset", "Accept-Language" These are setup by the user in the configuration of the Browser. These are used by I18N of JSTL to determine which of the ResourceBundles it should load. You determine which is the default by NOT appending the language code to it. You have a resource bundle for each supported languge:
Changing the charset requires a minor amount of work and is most times missed. Considering it's a string in the app, place the encoding in the resource bundle and use jstl to set the page encoding at the top.
Now, to create a complex bundle for a language like Japanese realy requires some finess, the characters are add using the \u code for each and concidering each Japanese charaters is 2 to 4 \u character, you really need to know the stuff and it gets difficult to read.
Thanks for the reply! Actually, I was aware of the accept headers, as that is how the jsp was determining which language file to use.
I talked with someone on another forum and it may be that I have the solution now. I have a bunch of properties files saved in native encodings (japanese, russian, english, etc.) I was running native2ascii using UTF8 as the parameter. Apparently I needed to specify the encoding the file was in on the command line, rather than what I wanted it be.
I haven't had a chance to try it out yet, but I'm hopeful that was the issue.
Is there any other advice you have?
Thanks for pointing out the editor as well. I'll give it a try.