• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Internationalization (specifically with Chinese characters)

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a weird problem I can't quite figure out. I am also not even sure if I'm posting in the correct forum, so please move my post if necessary.

Basically I am trying to get an HTML post to work with a chinese character.

I have a form that is multipart/form-data encoded:



The form has a single input in which I am trying to send the chinese character 我. For some reason, I am not getting the right character back when I break and inspect my command object in Eclipse.

I've tried a few things:

a) I don't explicitly set a page directive. When I submit, I get this back as a string: 我

This strikes me as incorrect because it should be a unicode character along the lines of '\uxxxx'

b) I set the content type via a page directive:



This gets me a little bit closer; when I break, I see three characters: ���

I understand that in the encoding, some characters can have variable length (1-4), so I'm not surprised if 我 requires such an encoding. However, I should expect one character instead of three.

Does anybody know how to resolve this issue?
 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
^_^ you should use this:
<%@ page contentType="text/html; charset=gb2312" %>

PS:You are chinese? hehe.
 
Min Huang
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Setting the charset to gb2312 doesn't work; I don't see the character I want. I've crafted a JSP that illustrates my problem:



Specifically, I have:
1) <%@page pageEncoding="UTF-8"%> set.
2) <%@page contentType="text/html;charset=UTF-8"%> set after the first directive.
3) <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/> in the head.
4) enctype="multipart/form-data" attribute in the form.
5) accept-charset="UTF-8" attribute in the form.

The results I see are:
For a GET: �ˆ‘ is the result.
For a POST with enctype="application/x-www-form-urlencoded": �ˆ‘ is the result.
For any other POST encoding: No entry in request parameter map.

Btw, yes I am Chinese.
 
Min Huang
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I figured it out. You have to put <% request.setCharacterEncoding("UTF-8"); %> at the top of the JSP. The character encoding has to be done before any params are read, or else it wont work.

Having a scriplet in your JSP is no good, so you can make a servlet filter to do just that. Make sure it's the first filter in the chain or it might not work. Order matters.

You can use CharacterEncodingFilter in Spring, or write your own:



You can replace the page directives with this in your web.xml:
 
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
Use character encoding 'Unicode' instead of 'UTF-8' and see your problem will get solved.
 
See where your hand is? Not there. It's next to this tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic