• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Reading a File which has Japanese character encoded in UTF-8 format.

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ranchers,
I need  help in reading a file which has Japanese character encoded in UTF-8 format.

Ex: my file has the following lines
Left = 892
Width=79
Caption= "#20250#35336#24773#22577#31649#29702"  //It is 会計情報管理

when reading the file I want to get the decoded value of the caption i.e 会計情報管理.
I tried reading using Buffered reader. but I am getting #20250#35336#24773#22577#31649#29702 as output.
Is there any other way to read this file?
please help. Thank you for your time in advance.
 
Saloon Keeper
Posts: 13280
292
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to CodeRanch!

Your Japanese caption isn't UTF-8 encoded. It's encoded using numeric XML character references.

You can rather easily decode each character yourself by simply parsing each decimal value as a Unicode code point:
 
Kim Tae
Greenhorn
Posts: 2
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you Stephan van Hulst. Issue is solved.
 
Marshal
Posts: 74048
332
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Nice solution
I keep forgetting to use var; I presume in this instance it behaves as if char[]. Can one do that encoding with StringBuilder#appendCodePoint(int) instead of new String(...)?
 
Stephan van Hulst
Saloon Keeper
Posts: 13280
292
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:I presume in this instance it behaves as if char[].


An int[].

Can one do that encoding with StringBuilder#appendCodePoint(int) instead of new String(...)?


Sure:
 
when your children are suffering from your punishment, tell your them it will help them write good poetry when they are older. Like this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic