• Post Reply Bookmark Topic Watch Topic
  • New Topic

character encoding  RSS feed

 
Abigail Decan
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
this is my program.


i wanted to check that the integer 41 is 'A' in UTF-8, although it's ')' in UTF-16
so i ran the program with javac -encoding UTF-8 C4.java
but the result was still ')'.

how do i fix this?

also, do constants for the max values of ASCII and UTF-8 exist in Java?
I need to show how many number of bits are used to represent characters in both encodings.
 
Paul Clapham
Sheriff
Posts: 22832
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Abigail Decan wrote:i wanted to check that the integer 41 is 'A' in UTF-8, although it's ')' in UTF-16


This is an incorrect assumption -- where did you get that idea from?

also, do constants for the max values of ASCII and UTF-8 exist in Java?
I need to show how many number of bits are used to represent characters in both encodings.


ASCII is defined to use numbers in the range from 0 to 127 to represent characters. I guess you could say that 127 exists in Java, but if you were expecting a member of an enum or something which contains that, then there isn't anything. And UTF-8 can be used to represent any character in Unicode; the number of bytes required to represent a Unicode character in UTF-8 varies depending on what the character is. I get the feeling there's some misconceptions going on, so have a look at the Wikipedia article UTF-8 for better information.
 
Abigail Decan
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i now realize that the table i referred to represented numbers in hex.

for the constant, i was expecting something like MAXIMUM_SYMBOL in the character class.
 
Paul Clapham
Sheriff
Posts: 22832
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Abigail Decan wrote:i now realize that the table i referred to represented numbers in hex.


Yes, I just realized that a moment ago when I looked at the ASCII table on a web page.

for the constant, i was expecting something like MAXIMUM_SYMBOL in the character class.


There are methods in the standard API which tell you which Unicode block a character is in, so for example you could find out that ω is in the GREEK block in Unicode, but that really has nothing to do with encoding of Unicode characters into bytes, which is what UTF-8 and other encodings are designed to do.
 
Abigail Decan
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thank you.

is there a method in Java that gives the number of bits required for a data?
i googled myself but the answer seems to be that there are none.
but i may have overlooked something, so i ask.
 
Paul Clapham
Sheriff
Posts: 22832
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What's "a data"? I don't understand your question, nor why you are asking it.
 
Abigail Decan
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
by data i mean a variable.

for example, short uses 16 bits to represent an integer.
and the class Short's SIZE constant gives me this value.

i want to show that UTF-8 uses 8 bits to represent a character.
but i don't know how because java uses UTF-16, so the Character'S SIZE will give me 16.
 
Paul Clapham
Sheriff
Posts: 22832
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Abigail Decan wrote:i want to show that UTF-8 uses 8 bits to represent a character.


But it doesn't. Didn't you read the Wikipedia article I posted the link to? It clearly shows examples of UTF-8 encoding characters using 8 bits, 16 bits, 24 bits, and so on.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!