• Post Reply Bookmark Topic Watch Topic
  • New Topic

Byte code rep. of chars in string not 16-bit unicode  RSS feed

 
Marco Loskamp
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

the char type in Java is a (unicode) 16-bit type. Many Java documentations (for example SCJP&Dev 2, p.352 bottom line) claim that in Java, Strings are composed of unicode 16-bit characters.

Fine so far.

But the bytecode of the StringQuestion.class file has the explicit string literal "Hello" (without the quotes) hardcoded in 8-bit. Why not in 16-bit (i.e. with ASCII-NUL character ^0) like ^0H^0e^0l^0l^0o?

To make sure this is not due to my vi-appearance on the screen, I have made an "xxd" hexdump on the class file, whose output contains the following line:



Even changing the explicit string literal "Hello" to "H\u0065llo" doesn't make a difference.

Thanks for your answers,
Marco



Note: this code does compile, but you can't run it, of course...
[ September 26, 2004: Message edited by: Marco Loskamp ]
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Welcome to JavaRanch!

The class file format (which is quite well-documented) uses UTF-8 encoding for Strings. UTF-8 is identical to ASCII for ASCII characters, and uses two or three bytes to encode non-ASCII characters. Saves a lot of space in the U.S.A., anyway.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!