Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

ASCII characters AND Java characters

 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what is the difference between ASCII characters AND Java characters ?

ASCII characters takes 1 byte but Java characters takes 2 bytes. why there is 2 type of characters?
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?
 
Steve Morrow
Ranch Hand
Posts: 657
Clojure Spring VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are many different ways of encoding characters. Java happens to use Unicode (16-bit version).
 
Steve Morrow
Ranch Hand
Posts: 657
Clojure Spring VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by alfred:
its confusing that there are two types of characters . when it will take 2 bytes and when 1 bytes ?

A Java char will always be two bytes in size. It will also always be unsigned.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can anyone describe the interfaces between Unicode & ASCII? For example, I can read an ASCII file into Java Unicode strings, and write the strings back to an ASCII file. Are the readers and writers doing the conversion? What if I wanted a file in Unicode?
 
Steve Morrow
Ranch Hand
Posts: 657
Clojure Spring VI Editor
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
>A Java char will always be two bytes in size. It will also always be unsigned

How can i believe it ? my program refuse it.





output:
========
6


you claimed [i]"A Java char will always be two bytes in size"[i], so that means i must get 6*2=12 bytes (because there 6 chars in the string and each char takes 2 bytes).

so you are wrong.
 
Junilu Lacar
Bartender
Pie
Posts: 7765
62
Android Eclipse IDE IntelliJ IDE Java Linux Mac Scala Spring Ubuntu
 
Junilu Lacar
Bartender
Pie
Posts: 7765
62
Android Eclipse IDE IntelliJ IDE Java Linux Mac Scala Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"alfred"

Please change your profile so that your publicly displayed name complies with the JavaRanch Naming Policy. Thanks for your cooperation.
[ May 11, 2005: Message edited by: Junilu Lacar ]
 
Joel McNary
Bartender
Posts: 1840
Eclipse IDE Java Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Try this code.

It would seem that the .getBytes() method of String does not always properly convert the bytes. (The java doc says: Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array. ) Note that even when a unicode character with a high-order byte is used, the .getBytes() only returned the low-order byte.

That said, you generally don't have to worry about the size of a char; Readers and Writers will handle this for you -- note that the Reader that I used properly returned the character as a two-byte character.
 
Joel McNary
Bartender
Posts: 1840
Eclipse IDE Java Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
alfred:

Welcome to JavaRanch! Please take a moment to read the JavaRanch naming policy and then please change your display name to comply. (We are looking for first and last names that are not obviously fictitious).

Thanks!
 
M Beck
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
quick question for Java language lawyers and/or implementation specialists:

what does Java do with Unicode code points that won't fit into 16 bits? is Java's "unicode" really UTF-16, or what?
 
Steve Morrow
Ranch Hand
Posts: 657
Clojure Spring VI Editor
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i am really very much confused.

suppose , my friend gave me a string and ask me how many bytes it will take ?

what should be my answer ?



side note :
-----------
do u think my logic was wrong ? it was a simple mathematics .

Or do u want to tell, what getBytes() method returns is basically wrong , because this method hides the actual result which happens to be 12. and there is no absolute method by which you can calculate the number of bytes.and truly the actual number of bytes is 12. the link you gave was complex rather than my question.

can anybody tell me whats going on ?
 
Jeremy French
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The number of bytes will always be 2 * the number of characters.

The relationship between ASCII and Unicode is that Unicode creates a much higher number of possible characters. However, the original numbers still carry over I believe, as such, what was character # 126 in ASCII is now character # 000126. (That's not technically correct, but if you apply the principal to binary notation rather than decimal notation, then my statement becomes true. It just adds zeroes to the front to use up 2 bytes.)
 
Jeremy French
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.
 
alfred jones
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

As to your logic, you simply incorrectly assumed the function of getBytes. It doesn't return the number of bytes the string is using, rather it converts the string to something else entirely, which confusingly uses fewer bytes.


thats a nice answer.






>The number of bytes will always be 2 * the number of characters.

>The relationship between ASCII and Unicode is that Unicode creates a much higher number of >possible characters.

yea, some odd looking chars(japanese,arabic ? )



>However, the original numbers still carry over I believe, as such, what was character # 126 in >ASCII is now character # 000126. (That's not technically correct, but if you apply the >principal to binary notation rather than decimal notation, then my statement becomes true. It >just adds zeroes to the front to use up 2 bytes.)


OK,
so tell me in this example , string "123456"

take out first char i.e "1" what do u call it ? ASCII char or Unicode Char ? i assume you will term this char as Unicode with a imaginary padding up leading zeros. right ?


Now here is the crucial point , if you tell "1" is an ASCII char then you will get 6 because ASCII char will take 1 bytes. but if you tell "1" is a Unicode with your imaginary leading zeros ( and also because its java langunage and java language chars are Unicode ) then it will take 2*6=12 bytes.


so, which one i should think about the char "1" . is it a Unicode char or ASCII char

because whole thing depends upon the decision of its status ?




i assume you will call it a unicode char, so there is actually 12 bytes are taken by this string. but we can not show it by programmatically because we can not have such methods.



but i have a method getBytes() in doc, if i use this method this will thik "1" as ASCII char and will give me result accordingly.


am i right ?
 
Jeremy French
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No.

getBytes() does not return the number of bytes in a string. It sounds like it does. It doesn't. getBytes() returns an array of bytes(not characters), which, may or may not coincide with the same byte as the character if it were represented in ASCII.

All characters in Java are 2 bytes. Absolutely. All the time. Ignore getBytes(). It's just confusing the issue for you. All characters in Java are 2 bytes.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic