• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

Parsing Chinese Characters by using Xerces

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
May I know does Xerces support Chinese Characters?

My XML file is encoded in UTF-8 format and including some Chinese characters in some tags. However, when I try to print the characters in char[] data (which are Chinese characters) in the method 'characters', some strange characters are returned '???'. May I know how to get the correct Chinese characters after getting the char[] data?


public void characters(char[] data, int start, int length){
.......
}

p.s. The platform is AS/400.

Thanks a lot.
 
Marshal
Posts: 28296
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Xerces supports all characters which XML supports. And XML certainly supports Chinese characters as you can see by reading the XML Recommendation.

However your question seems misguided to me. You complain about seeing question marks when you print those characters, but then you ask how to get them. It's equally probable that the encoding failure occurs when you try to print the characters which have been correctly got from Xerces.

So you're going to have to explain your process in a bit more detail, not just focusing on Xerces. Especially if you're printing data, which on your platform might well involve some more data conversions.
 
Ed Tang
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here is my code...
===============================================

===============================================


===============================================
When endofElement of 'ChineseName' is detected, write record to AS400 physical file.




=========================================

However, the chinese character cannot be written to the file successfully. Hex '3F3F3F' is written instead. When I try to print the value of chineseNameTagVal, '???' is shown in the console....


Could anyone help me? Thanks. I found some hints in the internet, seems need to use the classes of ByteArrayOutputStream & OutputStreamWriter to do some conversion? But I have no idea on how to use these classes....thank you so much!
 
Paul Clapham
Marshal
Posts: 28296
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


I thought there might be something like this.

You have a string containing Chinese characters. First you convert it to bytes using ISO-8859-1, which is the Western European character set. Since that character set does not include representations for any Chinese characters, it replaces all of them by question marks before converting them to bytes.

So already you have mangled your data beyond recognition. Converting those question-mark bytes back to chars using the CP937 encoding cannot possibly bring back the original data.

I would just get rid of this line entirely. It's the job of the AS400 and SequentialFile objects to convert from chars (in the Java program) to bytes (in the database), not yours. Just make sure that the job where this running and the database tables both have a suitable CCSID.
 
Yup, yup, yup. Tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic