Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

what is the precise use of codePointBefore() method  RSS feed

 
krishnadhar Mellacheruvu
Ranch Hand
Posts: 118
Android Java Objective C
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

is there any specific precise use of codePointBefore() method which generates the unicode value of the char that is present before the specified index value.

Say

****************************************************************************************************************



the above code generates the unicode value of the char 'J' which is before the specified index 1 which is this case is 'A'.

We also do have



this generates the unicode value of the char value at the specified index which in this case is 65 cause the char at the index 1 is A.

****************************************************************************************************************

since we have this method which when used can generate the unicode value i.e. if i want the unicode value of 'J' which is before 'A' i can directly use the codepointat method rather than codepointbeforemethod. so what is the use of codePointBefore() method?

Thanks
 
Peter Muster
Ranch Hand
Posts: 74
5
Eclipse IDE Python Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It might have to do with UTF-16 which takes twice as much space as UTF-8 and as such occupies two characters in a string (low surrogate and high surrogate). If you look at the concrete implementation in the Character class you will see that the approach is different:



I don't know if one method is preferable in some use cases while the other is in others. Perhaps someone who deals with UTF-16 frequently might shed some light on this. Perhaps there is no real reason for this. They don't seem to differ in performance if you just use them for "normal" UTF-8 strings.
 
Campbell Ritchie
Marshal
Posts: 55687
162
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Christian Pflugradt wrote:It might have to do with UTF-16 which takes twice as much space as UTF-8 . . .
Nearly, but not quite. When Java┬« came out, all Unicode characters in the known world could be fitted into 16 bits = a char. Unfortunately, there were nearly as many characters in the known world which weren't in Unicode at all. If you went to China, Japan and Korea, you could find them easily. So Unicode was expanded to accommodate about 1.1million characters, which means they don't fit into 16 bits any more. So the classes using Unicode were enhanced; the String#codePointAt() method was added in Java5 (you can look at that link and it says “Since” at the bottom of the link). That was about 12 years ago.
Now, if you use a codePoint method, you get an int in the range 0‑0x00_00_ff_ff for “old” Unicode characters and in a larger range, I think up to 0x00_10_ff_ff so you get some code points > 0xff_ff. I think the left 16 bits represent the high surrogate and the right half the low surrogate but I am not certain. So you can actually get code points which are different lengths. If I try "\ud801\udc00" as a String I find it has two chars and one code point which my terminal doesn't seem able to display. You can find which are the surrogates from the tutorial link above.

Moving question as too difficult for “beginning”.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!