• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

UTF

 
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Java uses a system called UTF for I/O to support international character sets
what does UTF mean? could anyone give me a clear idea?
 
Ranch Hand
Posts: 5040
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

You are right re java using UTF.
UTF apparently stands for UCS Transformation Format.
If you refer to the documentation on the DataInputStream
class using the API, this is part of what it says....


Data input streams and data output streams represent Unicode strings in a format that is a slight modification of UTF-8. (For more information, see X/Open Company Ltd., "File System Safe UCS Transformation Format (FSS_UTF)", X/Open Preliminary Specification, Document Number: P316. This information also appears in ISO/IEC 10646, Annex P.)


So if you are really interested, you might want to dig into the
document mentioned above.
But for the purposes of Java it is necessary (maynot be sufficient) to understand that the strings are represented
as Unicode and that
All characters in the range '\u0001' to '\u007F' are represented by a single byte.
The null character '\u0000' and characters in the range '\u0080' to '\u07FF' are represented by a pair of bytes.
Characters in the range '\u0800' to '\uFFFF' are represented by three bytes.
The multi-byte chars are used for most of the Asian languages.
English uses single byte chars (7-bit). Most European languages
use single byte chars (8-bit). This is how I18N is achieved
in java.

regds.
- satya
 
bacon. tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic