• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Rob Spoor
  • Henry Wong
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh
  • Jj Roberts

What is the difference between Java SE9 String methods: chars() and codepoints()

 
Ranch Hand
Posts: 170
1
Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello everyone,

as I was reading the String methods introduced in Java SE9 I came across these two methods  for official documentation

public IntStream chars()
Returns:
an IntStream of char values from this sequence


public IntStream codePoints()
Returns:
an IntStream of Unicode code points from this sequence



I tried out this example and both methods returned exactly the same, I cannot really seem to understand the point of these...




thanks
 
Saloon Keeper
Posts: 12806
278
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A string in Java can be seen as an array of UTF-16 encoded "code units". A code unit is a sequence of bits that represents a part of an abstract character. All ASCII characters can be represented by exactly one UTF-16 code unit. When you calls chars(), you get a stream of "code units".

A "code unit" is not the same thing as a "code point". A Unicode code point is a number that indexes an abstract character. Some characters have a higher index than you can represent with 2 bytes, such as East Asian ideographs or many emojis. For those, you need more than one "code unit" per "code point".

Take a look at the following application:
 
Their achilles heel is the noogie! Give them noogies tiny ad!
SKIP - a book about connecting industrious people with elderly land owners
https://coderanch.com/t/skip-book
reply
    Bookmark Topic Watch Topic
  • New Topic