• Post Reply Bookmark Topic Watch Topic
  • New Topic

UTF-8 representation - \u00fa (�) to ��  RSS feed

 
sri ranayama
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'd like to obtain a String containing "m��" (the UTF-8 "representation") out of a String containing "m�" (UTF-16 encoded). I'm new to those encoding issues.
How could i process?

Thanks a lot,

Sri
[ March 02, 2007: Message edited by: sri ranayama ]
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Java strings always use the same encoding; you can't change it. Encodings only come into play when you have to convert strings to bytes or vice versa, and the only situation I know of where you have to such a conversion within your program is when you're encrypting or decrypting text. You need to describe what you're trying accomplish before we can help you any further.
 
sri ranayama
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Alan and thanks for your reply.

I hope this will make things clearer but it's not even clear to me.
During a DB import, UTF-8 content of table records were automatically transcoded to the default DB encoding (latin 1). This gives us "xx��" instead of "xx�" for example.
I'm trying to do the same kind of "transcoding" (not sure this the correct term) thing in Java, i.e. to get out "xx��" of "xx�" so i can query those tables from String i loaded from the file system (UTF-8 is assumed). In the file: "xx�", in my the WHERE clause from my query Strings: "xx��".

I found a workaround (reasonable solution?) but I'm definitely not sure this is the way to go:



Any corrections appreciated
[ March 03, 2007: Message edited by: sri ranayama ]
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you really need to read a UTF-8 file as if it were Latin1, you can do that in one step: But this is a very bad idea; there's no guarantee that every byte in a UTF-8 encoded file will be valid in Latin1. Any byte that doesn't represent a character in Latin1 will be decoded as '?', and there will be no way to recover the real character. The same thing will have happened when the table was imported, which means you're now matching garbage with garbage. If at all possible, you need to re-import that table, using the right encoding this time.
 
sri ranayama
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Alan,

Thanks for your input that allows me to understang a little better and for letting me know the solution i got is bad from the root. I was expecting this.
Unfortunately, the DB is as is and i don't manage it. Can you think of any cleaner workarounds?

thanks,

Sri
[ March 04, 2007: Message edited by: sri ranayama ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!