Help coderanch get a
new server
by contributing to the fundraiser
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Printing multi character languages

 
Ranch Hand
Posts: 128
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Recently I got the framework called BeanIO which is very good for fixed length file I/O,But only the issue is with printing string with particular length other than English(like Japanese,thai)

Eg: suppose I want to print "こんにちは" this at particular position , its printing perfect(UTF-8 encoding) but the length of this string is mis calculating. Because That the position of the other values are not printing properly.

suppose I want to print "こんにちは" at 5 and next value at 6 and I used length 2 for printing(which means only print 2 character and ignore remaining).This printing Japanese language
disturbing entire row values lengths of the text file.

Can any one help me how to calculate length of this japanese/thai string like english?  Solution for this saves my time
 
Saloon Keeper
Posts: 15659
367
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No, this is not easy. You will run into lots of trouble with combining characters and decomposition modes and stuff.

Use a library like ICU4J to find the number of grapheme clusters in a string. You should also ask yourself: "What does fixed length" mean? Fixed number of grapheme clusters? Fixed number of unicode code points? Fixed number of bytes?
 
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Not sure about Japanese but Thai could be particularly difficult to handle as you have tonal shift characters as well as letter characters and whilst I'm no expert on Thai I would think that if you are printing 'n' characters you would need to include any tonal shift characters without counting them as characters printed. Thai also has some single vowel characters which contain multiple separate characters which wrap around the adjacent consonant ie part of the vowel character goes before the consonant and part of it goes after the consonant.

If my Thai is correct (and as I've already said I'm not an expert in Thai so I may be wrong here) the following is considered to be 2 characters (and is the name of a fruit).

เงำะ



The consonant is the

character. The other three characters can be used separately or combined in this arrangement to form a new single vowel character.
 
Tony Docherty
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You also have the additional problem in Thai of what constitutes the correct order of characters because in Thai the order in which characters are written is not necessarily the order in which they are pronounced?

For example:

โทนี่

is the written form of my name "Tony" but the first character in that written sequence is the 'o' sound and the second character is the 't' sound. So if you want to get the first character of my name which is a 't' you need to return the second character in that sequence?

And for anyone who is wondering the third character is the 'n' sound and the superscript character over the third character with a tonal shift character is the 'e' sound (letter y)
 
The problems of the world fade way as you eat a piece of pie. This tiny ad has never known problems:
We need your help - Coderanch server fundraiser
https://coderanch.com/t/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic