• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

UTF-16 encoding decoded as UTF-8

 
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi I know this sounds strange so I'm hoping someone with a bit more knowledge on encoding can answer this.
I'm trying to pass a String as UTC2 to a third party (They are rendering this as an SMS). I am using UTF-16 as, as far as I know this is a super-set of UTC2.
Unfortunately I have been running into problems, after a lot of trial and error I am getting the String Виталий А Терниевский back as Виталий<corrupted char><corrupted char>Терниевский.

What I don't understand is why.
I am getting this result by encoding the String as UTF-16BE -> %04%12%04%38%04%42%04%30%04%3B%04%38%04%39+%04%10+%04%22%04%35%04%40%04%3D%04%38%04%35%04%32%04%41%04%3A%04%38%04%39

Then decoding this as UTF-8 -> 8B0;89 "5@=852A:89

I thought that this would just have corrupted the data but it is being rendered as something. Does anyone have an idea on what the second String represents?
Cheers,
Barry
 
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is UTC2 ?
[ October 22, 2008: Message edited by: Gamini Sirisena ]
 
Barry Higgins
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Apologies that was a typo I meant UCS-2.
Thanks,
Barry
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you post some relevant pieces of code? That might help the ranchers here to get an idea..
 
Barry Higgins
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In short this will print the String that I am passing

8B0;89 "5@=852A:89=

This seems to "nearly" render correctly.
I have no idea how this can create anything but a garbled mess, but somehow it produces nearly the correct text.



Thanks,
Barry
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Barry Higgins:
In short this will print the String that I am passing

8B0;89 "5@=852A:89=

This seems to "nearly" render correctly.
I have no idea how this can create anything but a garbled mess, but somehow it produces nearly the correct text.



Thanks,
Barry




What am I missing? You URL encode using UTF-16BE and then you treat that URL encoded output as if it is URL encoded using UTF-8 to decode it. This does not make sense. If you encode with X then you need to decode with X.

It is not obvious to me what you are trying to do.
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I believe James is right..

Check your program modified below. Most probably your console does not support printing unicode characters. So this program generates an html and hopefully your browser supports unicode display. Comes out nicely for me.

I think following is what you need to do. I think the following explanation is correct.

The String representation of java is always UTF-16.
So when you do..
URLEncoder.encode(russian, "UTF-8")
the String russian is converted to UTF-8 and then URL encoded.
The third party then decodes the URL encoded string knowing and
specifying that it is a UTF-8 string.


 
Barry Higgins
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you both for your response. I have probably not been very clear in what I was saying.
I understand that encoding in UTF-16 and then decoding in UTF-8 does not make sense.
My question is more to do with what is the result if one were to do this?
The Strings that are resulting from the code snippet I put up earlier are very close to what the third party provider requires (I know this based on the SMS messages that I have been receiving). However I do not know why anyone would want text in this format i.e. does it represent something formatted differently and if there is a more structured and understood way of attaining this?
I hope this makes a bit more sense.
Cheers,
Barry
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Barry Higgins:

I understand that encoding in UTF-16 and then decoding in UTF-8 does not make sense.
My question is more to do with what is the result if one were to do this?



Garbage, as far as I can tell...
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic