Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

[J2ME] From Unicode to UTF-8

 
Adriano Bellavita
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I have to convert a Unicode String to its UTF-8 encoding.

I'm working with emoticons so:

this is my input:

U+1F600 (or \uD83D\uDE03, chars associated with it)

this should be the output

f0 9f 98 80

How can I get this?

Ty and BR,

Adriano
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15641
47
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Something like this:

By the way, that gives me f0 9f 98 83, not f0 9f 98 80.
 
Adriano Bellavita
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It doesn't work...

If I try this solution, I'm wondering about 2-byte chars. Each chars of "Hello world" String is built with 2 byte.

In my case, my String is "😀": an emoticon!

To better understand what I'm trying to do, I'll make an example:

we can easily convert a String using the getBytes method when the unicode representation of every char of the String is included between 0x0000 and 0xFFFF values.

The "😀" unicode representation overflows: to be char-encoded, we need 2 charts (not one, so more than 2 bytes....) as we can see here:

http://www.utf8-chartable.de/unicode-utf8-table.pl

The "😀" representation is: 0x1F600 (unicode: so something like 0001|F600???) and f0 9f 98 80 (hex)

So I have to represent a single digit ("😀") like it's composed by three (or four???) bytes...

How can I do this?

 
Adriano Bellavita
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jesper de Jong wrote:Something like this:

By the way, that gives me f0 9f 98 83, not f0 9f 98 80.


Wow.... Give me a moment....

Ok, you use getBytes("UTF-8")...

But then? What you do?

How could you obtain f0 9f 98 83?

If I print the byte array, the "for" returns:

-19
-96
-67
-19
-72
-125

........
 
Paul Clapham
Sheriff
Posts: 21579
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know how you could get that. You can never get more then two UTF-8 bytes for a Unicode character. When I run that code the bytes in the resulting array are -16, -97, -104, -125. But that's the decimal representation assuming the byte value is signed. The hexadecimal string representation of those bytes is F0, 9F, 98, 83.
 
Paul Clapham
Sheriff
Posts: 21579
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... Well, that's interesting. When I take the six bytes you say you got, and convert them to a String assuming they were UTF-8, I do actually get "\uD83D\uDE03". Here's the code I wrote:



I'm using Java 7. I recall seeing something in some JVM change report about fixing code to use canonical UTF-8, but don't remember when that was. What version of Java are you using?

And just in case we are on the wrong track here, why do you have to convert a String to the hexadecimal representation of its UTF-8 encoding?
 
Adriano Bellavita
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm using Java 1.4, MID profile.

I only want to obtain what this table shows:

http://www.utf8-chartable.de/unicode-utf8-table.pl

If you go to "U+1F600 ... U+1F64F - Emoticons" section, you'll see that Unicode starts from U+1F600 Unicode code point and ends at U+1F6FF.

So I want that each Unicode entry is converted into the relative UTF-8 bytes.

My start point is the Unicode code point (or chars representation), not the String.

My end point is its exadecimal representation.

TY in advance,

Adriano

 
Paul Clapham
Sheriff
Posts: 21579
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay, you're using Java 1.4, which means that you have to use the UTF-16 encoding of the character (as you did) rather than using the character directly, which Java 5 allows you to do.

At any rate it seems that you are generating something which appears to be a UTF-8 version of that character in some way, at least it converts back to the character via new String(bytearray, "UTF-8"). However I still think you need to explain your original problem, rather than trying to discuss a (possibly) failed solution to that unknown problem.
 
Adriano Bellavita
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
TY for your reply.

Let's take a look to the table showed at this URL

unicode-utf8

I must obtain the result of the third column, strarting from the value of the first one.

That's my problem...
 
Paul Clapham
Sheriff
Posts: 21579
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let me be more clear, then. The problem I am asking about is the problem to which "I must obtain the result of the third column, strarting from the value of the first one" is your idea of a solution. There may be better ways of solving that unknown problem, but we can't know until we know what that problem is.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic