• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Is String represented using UTF16?

 
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello there,

I read in java blogs that String objects are represented as UATf16 format.
Can we proof it by any piece of code?
Meaning any program that can show us that String is represented by UTF16.

Thanks in advance,
Arfeen.
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

arfeen khan wrote:I read in java blogs that String objects are represented as UATf16 format.


Internally, yes - generally (see below). However, it also includes "surrogate pairs", which I'm not sure certain are included in the UTF-16 standard. They also do not contain BOMs (Byte Order Marks) since Java internal byte order is always the same.

Can we proof it by any piece of code? Meaning any program that can show us that String is represented by UTF16.


Sure. Bang some text, especially containing some esoteric characters, into a String, and print out the value of each character.

However, my question would be: Why would you want to? It's clearly stated in the JLS that char "values are 16-bit unsigned integers representing UTF-16 code units". And since Strings are (generally) made up of chars, it stands to reason that Strings are made up of UTF-16 characters.

I say "generally", because I believe you can now specify that Strings use bytes internally to save space; although exactly how that works, I don't know.

Winston
 
Rancher
Posts: 1043
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

arfeen khan wrote:
Meaning any program that can show us that String is represented by UTF16.



Even if Java internally represented Strings otherwise, in, say, UTF8 or UTF32, you could not tell or prove. The API does not give access to this. You can of course check the source of String, but one could imagine a different implementation of the same API.
 
arfeen khan
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank You Winston Gutkowski for your reply.

Thank you Ivan Jozsef Balazs for suggestion.
 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic