• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

characters can not be displaied for codes between 0 and 65,535

 
Ranch Hand
Posts: 62
Notepad Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear expers

I can't understand why my JVM doesn't display the characters which have codes from 0 to 65,535. What is causing this ?

Here are print screens coming from my JVM, attached.

Thank you very much for all your help and time !

kind regards,

marius

Here is my code :

c1.PNG
[Thumbnail for c1.PNG]
print screen 1
c2.PNG
[Thumbnail for c2.PNG]
print screen 2
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
1. Not every value between 0 and 65535 is a valid Unicode character.

2. Even those that are valid characters are not valid in every encoding.

3. In general, you need to specify an appropriate encoding for your display app (looks like Windows cmd.exe, in this case) to use.

4. Even if your display tool is using the same encoding as what the character was written in, if the font that's being used to display the character does not have a glyph for that character, you'll get a default, such as a box or a question mark.

5. I haven't even looked at your getRandomChar() method, so there may be problems there.

6. Finally, just randomly generating characters and then displaying them in an arbitrary tool with an unspecified encoding and unspecified font is not a recipe for success. Perhaps if you could explain what you're actually trying to accomplish?
 
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try displaying the output on a JOptionPane. Go through the Character wrapper class and see whether there are any methods allowing you to see whether a particular char can be printed at all. There probably is.
 
Marius Constantin
Ranch Hand
Posts: 62
Notepad Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote:

1. Not every value between 0 and 65535 is a valid Unicode character.

2. Even those that are valid characters are not valid in every encoding.

3. In general, you need to specify an appropriate encoding for your display app (looks like Windows cmd.exe, in this case) to use.

4. Even if your display tool is using the same encoding as what the character was written in, if the font that's being used to display the character does not have a glyph for that character, you'll get a default, such as a box or a question mark.

5. I haven't even looked at your getRandomChar() method, so there may be problems there.

6. Finally, just randomly generating characters and then displaying them in an arbitrary tool with an unspecified encoding and unspecified font is not a recipe for success. Perhaps if you could explain what you're actually trying to accomplish?



Thank you very much Jeff and Ritchie !

Jef regarding your answers, I have some more questions. hope you have some more time to spare. thank you very much for everything. I really really appreciate your help.

1. invalid UTF-8 characters codes are decimal codes 192 193 245...255 ?

"Red cells must never appear in a valid UTF-8 sequence. The first two (C0 and C1) could only be used for overlong encoding of basic ASCII characters. The remaining red cells indicate start bytes of sequences that could only encode numbers larger than the 0x10FFFF limit of Unicode. The byte 244 (hex 0xF4) could also encode some values greater than 0x10FFFF; such a sequence is also invalid. "

wikipedia : wikipedia UTF-8 Codepage layout

2. invalid UTF-8 character codes, are valid character codes in another encoding ?

3. how can I specify an encoding for my display app ? cmd.exe and Notepad++ for windows ?

6. I am just trying to display 175 characters selected randomly

Thank you very much for all your help !

kind regards,
marius
 
Marshal
Posts: 28176
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Marius Constantin wrote:6. I am just trying to display 175 characters selected randomly



The question "Why?" still applies here.

Is there a purpose for this display? Is it important that you be able to choose Cyrillic and Coptic characters (just for example) and have them be displayed? If so then why did you choose to restrict your choices to only characters from the BMP of Unicode?
 
Marius Constantin
Ranch Hand
Posts: 62
Notepad Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Marius Constantin wrote:6. I am just trying to display 175 characters selected randomly



The question "Why?" still applies here.

Is there a purpose for this display? Is it important that you be able to choose Cyrillic and Coptic characters (just for example) and have them be displayed? If so then why did you choose to restrict your choices to only characters from the BMP of Unicode?



Hi Paul !

thank you so much for answering ! 175 was a randomly thought of character. I just want to display randomly 175 characters of the UTF-8 character set in notepad++ or cmd in any encoding, in any font. This is for studying purposes, I am learning how to program in Java. Just for the sake of programming, just for fun

a lot more clear now ?

please help.

kind regards,
marius
 
Paul Clapham
Marshal
Posts: 28176
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You're still confused, I think. There's no such thing as "the UTF-8 character set". UTF-8 is an "encoding" or a "charset", it's Unicode which is the character set. Java characters are Unicode characters, which require 16 bits to represent. (Let's ignore the "astral" planes of Unicode which go beyond 65535 for now.) But often people want to store Strings (which are arrays of characters internally) in arrays of bytes, which as you know are 8 bits. So there has to be an "encoding" process to do that, and there is a very long list of encodings which do it in various ways.

Many of those encodings, like ISO-8859-1 and its relatives, can only represent a subset of Unicode characters, and when they are given a character outside that subset they just encode it as a question mark. But others, like UTF-8, can represent any Unicode character. They do that by encoding each character as one or more bytes, as you will have seen from what you read in Wikipedia.
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Does Java™ directly use UTF-8 at all? I thought it used UTF-16 whenever there are chars greater than 0xffff.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Does Java™ directly use UTF-8 at all? I thought it used UTF-16 whenever there are chars greater than 0xffff.



The char type in the Java language is always UTF-16 (and hence so are Character and String). Classes like Readers and Writers that deal with converting back and forth between Java chars and bytes in particular encoding can be told which encoding to use.
 
Marius Constantin
Ranch Hand
Posts: 62
Notepad Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Does Java™ directly use UTF-8 at all? I thought it used UTF-16 whenever there are chars greater than 0xffff.



could you give me an example of such character ?

thank you !
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As JV has said, Java™ uses UTF-16 throughout. I suggest you start with the Unicode FAQ.
 
reply
    Bookmark Topic Watch Topic
  • New Topic