• Post Reply Bookmark Topic Watch Topic
  • New Topic

UTF8 Support in Ruby on Rails but not Java?  RSS feed

 
Liz Reynolds
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I developed a Swing application that would diplay a listing of Japanese verbs, originally using Ruby on Rails scaffolding. Since I have japanese fonts installed and my os regional options include Japanese keyboard support, I have no problem (I'm interacting with a MySql DB and just had to set the encoding in the db config section to UTF8).

However, I'm trying to develop the same application in Swing using an XML document rather than a db as my model. I have no problem listing the verbs if I use an InputSource or a InputStreamReader with the encoding set to UTF8, but when I try to enter a new verb using a form embedded in an HTML document (all in Swing), the form returns garbage. I don't think there is Swing support for the "accept-charset" parameter of form, but still, I thought the underlying charset encoding of Java was UTF8 or UTF16. The form fields are returned as strings. Do I need to change those to bytes or is it just not possible to receive these fields as entered Japanese characters (Kanji and kana) with the default locale (US) that I'm using? Still it appears to work under Ruby.
 
Brian Cole
Author
Ranch Hand
Posts: 954
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Liz Reynolds:
However, I'm trying to develop the same application in Swing using an XML document rather than a db as my model. I have no problem listing the verbs if I use an InputSource or a InputStreamReader with the encoding set to UTF8, but when I try to enter a new verb using a form embedded in an HTML document (all in Swing), the form returns garbage. I don't think there is Swing support for the "accept-charset" parameter of form, but still, I thought the underlying charset encoding of Java was UTF8 or UTF16.


If the form is embedded in an HTML document, then how can it be "all in Swing"? Please explain what exactly you are doing.

I guess you could say that the "underlying charset encoding of Java" is essentially UTF16, in that java.lang.String uses an array of 16-bit characters, but presumably you are doing some kind of I/O to grab your Japanese characters and something is going wrong. Java shouldn't have trouble with UTF8 but you might have to tell it to use UTF8, depending on what exactly you are doing.

I'm not sure what to say about "accept-charset". That sounds like a web thing, pretty much orthogonal to Swing.

[btw, please delete your duplicate posting.]
 
Liz Reynolds
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If the form is embedded in an HTML document, then how can it be "all in Swing"? Please explain what exactly you are doing.


Swing supports HTML documents (and style sheets) to a limited extent. You could say my gui is operating somewhat like a web application, but it's Swing. I'm using a subclassed version of JEditorPane (HTMLPane) and a subclass of HTMLEditorKit (HTMLPaneEditorKit) to get a form view that is similarly subclassed to process locally for my pane. I tried various versions of setting the content type for HTMLPane (JEditorPane), such as setContentType("text/html; charset=EUC-JP") or ("text/html charset=\"EUC-JP\""), setContentType("text/html; charset=UTF-8), etc. but it doesn't appear to be working. The characters that were entered into the value field on the form in kanji or kana with the IME 2000 Japanese keyboard are returned as the same garbage character (i.e. %3F%3F%3F) no matter what I entered into the form.

I hope this explanation is a little clearer. Thanks in advance for any help you can provide.

BTW, I did not make a duplicate post. My connection was slow so I hit post twice, but the 2nd one was stopped with an error message.
[ June 13, 2007: Message edited by: Liz Reynolds ]
 
Brian Cole
Author
Ranch Hand
Posts: 954
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Liz Reynolds:
I'm using a subclassed version of JEditorPane (HTMLPane) and a subclass of HTMLEditorKit (HTMLPaneEditorKit) to get a form view that is similarly subclassed to process locally for my pane. I tried various versions of setting the content type for HTMLPane (JEditorPane), such as setContentType("text/html; charset=EUC-JP") or ("text/html charset=\"EUC-JP\""), setContentType("text/html; charset=UTF-8), etc. but it doesn't appear to be working.


ok, but it's still unclear exactly how you are getting your EUC-JP or UTF-8 (which is it?) text into your JEditorPane/HTMLPane. Probably best to just show us your code.

In the absense of code, it seems to me it should work if you try something like this:

InputStreamReader reader = new InputStreamReader(yourInputStream, "EUC-JP");
yourEditorPane.read(reader, "description");

Tell me, is it only the text in the field that gets munged? If you can see kanji and kana text in non-form HTML, just not in the form fields, then I guess something trickier is going on.
 
Liz Reynolds
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Brian, for responding. I think I figured out my problem. (I also didn't make it clear that the data is coming in from the form and then stored in the XML document to update the "model." I have no problem just displaying data because I set the encoding on the InputSource to UTF-8.)

The problem appears to be that FormView (in its appendBuffer method) is using a deprecated version of the URLEncoder encode method that defaults to the platform encoding scheme. However, appendBuffer is private so I can't override it. Ow!
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!