Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

UTF8 java + arabic

 
Lucy Sommerman
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi - I need to get arabic into a java string. Have saved the arabic as UTF-8, wondering about the correct way to get that into a string? googling gives me lots of suggestions, so just wondering which is the correct one.

Thanks

L
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How is this data currently stored? Binary in a database?
 
Lucy Sommerman
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
text file - as utf-8
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, then you want to be concerned with how you're reading in that data. that is, Use a ByteBuffer with UTF-8 encoding, as following:


[ September 15, 2005: Message edited by: Max Habibi ]
 
Lucy Sommerman
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks u r a lifesaver. L
 
Lucy Sommerman
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
just to check.

and the string itself will be UTF8 though and not converted to UTF 16? this is plugging into something else, will not handle UTF 16 - thanks

L
 
Grahamsmit Smith
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi:

I am having some difficulty with UTF-8 encoded
chracaters in Java.

My XML has a question which has cyrillic characters. My Java servlet renders this as HTML with a form for the reply.
The HTML produced
displays OK in the browser (the response type on the
Java servelet has to be set to "text/html;
charset=UTF-8" for this to work).

I have to send cyrillic characters back in the
response to the question, in a text field on the form.
The browser is sending back a byte stream (which I am
printing here as hex): d0b3d0bed180d0bed0b4 (this is a
cyrillic word correctly coded as utf-8).

However, on collecting the response (using
request.getParameterValues(fieldname))the servlet
returns the byte stream: d0b3d0bed13fd0bed0b4.
A mistake in the fifth byte!

Has anyone heard of this problem? I suspect the
problem is in the JAVA UTF-8 converter.

Regards

Graham
 
Grahamsmit Smith
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I now know the answer, thanks to Bruno Van Haetsdaele .

Before calling request.getParameterValues(fieldname));
one should call request.setCharacterEncoding("UTF-8");

Grahamsmit
 
Vlado Zajac
Ranch Hand
Posts: 245
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Lucy Sommerman:
just to check.

and the string itself will be UTF8 though and not converted to UTF 16? this is plugging into something else, will not handle UTF 16 - thanks

L


Strings are sequences of characters which are 16-bit (UFT-16). You can (and probably need to) convert the String to byte array or write to stream to plug it into "something else". In both cases character encoding can be specified.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic