• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

UTF vs Unicode (JQ+)

 
Mariusz Szurnacki
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Question ID :988397479953
Which of the following encoding schemes is used by the jvm internally for storing identifiers etc.?
- Unicode
- UTF8
- ASCII
- 8859_1
- It depends on the platform.
I answered Unicode, but it's the wrong answer: according to JQ+ it should be UTF8. But I think I was right, cos
inside JVM, text is represented in 16 bit Unicode and for I/O, UTF is used.
Could you put me right?
Have a nice day,
Mariusz
 
Fei Ng
Ranch Hand
Posts: 1245
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For efficient reason they use UTF8 since unicode is not
particularly space efficient. But VM does translate them externally back from UTF8 to Unicode efficiently.
Correct me if i am wrong.
 
Marcus Green
arch rival
Rancher
Posts: 2813
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Characters stored as Unicode always occupy 2 bytes. UTF8 is a way of storing both Unicode and ASCII text. If the text is within the ASCII range it will occupy 1 byte, if it is larger than the 1 byte range of ASCII it will use the Unicode encoding scheme and occupy more than 1 byte.
As much of the worlds text is stored within the range of ASCII UTF8 offers considerable space saving whilst allowing the huge character representation of the Unicode encoding scheme.
Marcus

------------------

http://www.jchq.net Mock Exams, FAQ,
Tutorial, Links, Book reviews
Java 2 Exam Prep, 2nd Edition by Bill Brogden and Marcus Green
=================================================
JCHQ, Almost as good as JavaRanch
=================================================
 
Mariusz Szurnacki
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi again!!!
Thanks for your answers. I know what is Unicode and UTF, but I'm still not sure about the right answer to my question: "Which of the following encoding schemes (Unicode or UTF) is used by the jvm internally for storing identifiers etc.?".
According to RHE:
"Java uses two kinds of text representation:
- Unicode for internal representation of characters and strings
- UTF for input and output.
(...)
The outside-the-computer format for Unicode is known as UTF.".
So I think we all are sure that Java�s char data type uses Unicode encoding (and in this way String class too), and UTF is used for I/O.
But which encoding is used by the jvm internally for storing identifiers?
Have a nice day,
Mariusz
 
Paul Anilprem
Enthuware Software Support
Ranch Hand
Posts: 3762
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please read section "4.4.7 The CONSTANT_Utf8_info Structure
" of JVM spec. ( http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#7963 )
-Paul.
------------------
SCJP2, SCWCD Resources, Free Question A Day, Mock Exam Results and More!
www.jdiscuss.com
Get Certified, Guaranteed!
JQPlus - For SCJP2
JWebPlus - For SCWCD
JDevPlus - For SCJD
 
Paul Anilprem
Enthuware Software Support
Ranch Hand
Posts: 3762
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The method names, field names etc. are all represented using this CONSTANT_Utf8_info .
-Paul.
 
Marcus Green
arch rival
Rancher
Posts: 2813
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have read that explanation in RHE several times and I have concluded it doesn't tell me much at all. I did lot of research on the web to find supporting information without any luck at all. I have a copy of the excellent Rusty Harold I/O book and that doesn't throw much light on the topic.
I have not heard of this topic coming up in the exam, even though the objectives imply that it might.
Mr Earnest?
Marcus
------------------

<A HREF="http://www.jchq.net</A>" TARGET=_blank>http://www.jchq.net[/URL]
Mock Exams, FAQ,
Tutorial, Links, Book reviews
Java 2 Exam Prep, 2nd Edition by Bill Brogden and Marcus Green
=================================================
JCHQ, Almost as good as JavaRanch
=================================================
[This message has been edited by Marcus Green (edited November 06, 2001).]
 
Mariusz Szurnacki
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul!!!
 
Jose Botella
Ranch Hand
Posts: 2120
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But UTF8 is not used by the JVM but a modified version of it, so I don't think the answer UTF8 is right either.
I guess this question has not an exact aswer, and besides that maybe it is not likely to appear in the exam:

I think that a SCJP-to-be should know that the source of a java program can utilize Unicode for String and character literals, identifiers and comments. But I don't think that she/he should know the exact format by which descriptors, special strings and the content of Strings is stored whithin the JVM.
 
Paul Anilprem
Enthuware Software Support
Ranch Hand
Posts: 3762
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well yes, the spec. does say that this is a little different than "standard" UTF8 but all over the place they still call it UTF8.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic