Forums Register Login

[J2ME] From Unicode to UTF-8

+Pie Number of slices to send: Send
Hi all,

I have to convert a Unicode String to its UTF-8 encoding.

I'm working with emoticons so:

this is my input:

U+1F600 (or \uD83D\uDE03, chars associated with it)

this should be the output

f0 9f 98 80

How can I get this?

Ty and BR,

Adriano
+Pie Number of slices to send: Send
Something like this:

By the way, that gives me f0 9f 98 83, not f0 9f 98 80.
+Pie Number of slices to send: Send
It doesn't work...

If I try this solution, I'm wondering about 2-byte chars. Each chars of "Hello world" String is built with 2 byte.

In my case, my String is "😀": an emoticon!

To better understand what I'm trying to do, I'll make an example:

we can easily convert a String using the getBytes method when the unicode representation of every char of the String is included between 0x0000 and 0xFFFF values.

The "😀" unicode representation overflows: to be char-encoded, we need 2 charts (not one, so more than 2 bytes....) as we can see here:

http://www.utf8-chartable.de/unicode-utf8-table.pl

The "😀" representation is: 0x1F600 (unicode: so something like 0001|F600???) and f0 9f 98 80 (hex)

So I have to represent a single digit ("😀") like it's composed by three (or four???) bytes...

How can I do this?

+Pie Number of slices to send: Send
 

Jesper de Jong wrote:Something like this:

By the way, that gives me f0 9f 98 83, not f0 9f 98 80.



Wow.... Give me a moment....

Ok, you use getBytes("UTF-8")...

But then? What you do?

How could you obtain f0 9f 98 83?

If I print the byte array, the "for" returns:

-19
-96
-67
-19
-72
-125

........
+Pie Number of slices to send: Send
I don't know how you could get that. You can never get more then two UTF-8 bytes for a Unicode character. When I run that code the bytes in the resulting array are -16, -97, -104, -125. But that's the decimal representation assuming the byte value is signed. The hexadecimal string representation of those bytes is F0, 9F, 98, 83.
+Pie Number of slices to send: Send
... Well, that's interesting. When I take the six bytes you say you got, and convert them to a String assuming they were UTF-8, I do actually get "\uD83D\uDE03". Here's the code I wrote:



I'm using Java 7. I recall seeing something in some JVM change report about fixing code to use canonical UTF-8, but don't remember when that was. What version of Java are you using?

And just in case we are on the wrong track here, why do you have to convert a String to the hexadecimal representation of its UTF-8 encoding?
+Pie Number of slices to send: Send
Hi,

I'm using Java 1.4, MID profile.

I only want to obtain what this table shows:

http://www.utf8-chartable.de/unicode-utf8-table.pl

If you go to "U+1F600 ... U+1F64F - Emoticons" section, you'll see that Unicode starts from U+1F600 Unicode code point and ends at U+1F6FF.

So I want that each Unicode entry is converted into the relative UTF-8 bytes.

My start point is the Unicode code point (or chars representation), not the String.

My end point is its exadecimal representation.

TY in advance,

Adriano

+Pie Number of slices to send: Send
Okay, you're using Java 1.4, which means that you have to use the UTF-16 encoding of the character (as you did) rather than using the character directly, which Java 5 allows you to do.

At any rate it seems that you are generating something which appears to be a UTF-8 version of that character in some way, at least it converts back to the character via new String(bytearray, "UTF-8"). However I still think you need to explain your original problem, rather than trying to discuss a (possibly) failed solution to that unknown problem.
+Pie Number of slices to send: Send
TY for your reply.

Let's take a look to the table showed at this URL

unicode-utf8

I must obtain the result of the third column, strarting from the value of the first one.

That's my problem...
+Pie Number of slices to send: Send
Let me be more clear, then. The problem I am asking about is the problem to which "I must obtain the result of the third column, strarting from the value of the first one" is your idea of a solution. There may be better ways of solving that unknown problem, but we can't know until we know what that problem is.
Can you smell this for me? I think this tiny ad smells like blueberry pie!
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 4994 times.
Similar Threads
character encodings in streamReaders/Writers
BufferedWriter with prime quotation marks
Why the Discrepancy for chars?
URLDecoder.decode problem
CharacterEncoding
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 28, 2024 14:17:34.