What is the range for unicode values in char data type?

Feb 23, 2005 19:42:00

What is the range for all unicode values that i can use to initialize a char data type?

Feb 23, 2005 20:56:00

A char can range from 0 to 65535. I don't know how many of these values have been assigned a Unicode graphic.

Feb 23, 2005 22:56:00

You might be interested in the method, Character.isDefined(char ch), which returns a boolean depending on whether the argument char is defined in Unicode.

In general, you'll find that within the range of possible char values, there are numerous Unicode gaps. For example, \u0237 through \u0249 are not defined. You can assign these values to a char, but they won't translate to Unicode characters.

Also note that in Java 1.5, some of the values within the char range are used for "surrogate pairs," which allows representation of supplementary characters -- that is, Unicode characters with code points greater than \uFFFF. In the context of a 16-bit char, these surrogate values (\uD800 - \uDFFF) are considered undefined.

"...supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF). A char value, therefore, represents Basic Multilingual Plane (BMP) code points [\u0000 to \uFFFF], including the surrogate code points..."

Ref: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html
[ February 24, 2005: Message edited by: marc weber ]

Feb 24, 2005 00:29:00

Originally posted by Mike Gershman:
A char can range from 0 to 65535. I don't know how many of these values have been assigned a Unicode graphic.

I count 59177.

Feb 24, 2005 06:14:00

I know they range from 0 to 65535 in terms of integers, but i really would like to know in terms of unicode characters.
The point is that i`ve seen questions on mock exams that asks me for example if

char a = '\u000d'

is valid. In this case, it`s not, but it really looks it would be allright. I also checked that

char b = '\u101'

is also valid. This is weird or no? So, am i supposed to memorize all valid unicode initializations for the exam?

Feb 24, 2005 06:56:00

The point is that i`ve seen questions on mock exams that asks me for example if

char a = '\u000d'

is valid. In this case, it`s not, but it really looks it would be allright.

'\u000d' is the carriage return character (not 'a') and is a legal unicode character. However, '\u000d' and '\u000a' (new line) should not appear anywhere in a Java source program because the Java compiler will treat them as actual line breaks in your program text and break your statement into two lines. Use '\r' and '\n' instead.

If you really want to learn some Unicode, just remember those two, 'u0020' is blank, numbers start with '\u0030' is 0 and 'u0031' is 1, etc., capital letters start with '\u0041' is A, and lower case letters start with '\u0061' is a. That is more than enough for the SCJP exam and for ordinary programming in the English language.

Feb 24, 2005 07:51:00

Thanks for the explanation Mike, you got to the point I need.

I wasn't selected to go to mars. This tiny ad got in ahead of me:

a bit of art, as a gift, the permaculture playing cards

https://gardener-gift.com