• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Determine Character Set of XMl file using Java

 
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a XML file. I don�t know using which Character Set it was written. If I want to know the Character Set of the file, how to achieve this using Java?
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is there an "encoding" attribute in the <?xml declaration?
Bill
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Moving to our XML forum...
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
XML has rules for determining the encoding of a document. You will find them in Appendix F of the XML Recommendation. As Bill suggests, part of the algorithm involves the "encoding" attribute of the document's prolog.

However it should never be necessary for you to have to do that. Just get an InputStream (not a Reader) that reads the document, and pass that to your XML parser. The parser should know the rules and deal with it accordingly.
 
Chetan Parekh
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My problem is that the current program generates XML file without using any parser. XML file are getting generated as a flat file. They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?

There is getEncoding() in InputStreamReader. If I use it, will it solve my problem? I am new to encoding part.

http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStreamReader.html
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Chetan Parekh:
They are not writing any encoding information in generated XML file. But it can have any encoding. So is there anyway by which I can determine the encoding of the XML file?

Then "they" may not be doing it correctly. If "they" don't declare an encoding in the XML document then they must encode the document as UTF-8 or UTF-16. This is not optional, it is required by the XML recommendation.

So if they are not doing that, it is not your responsibility to fix the problem. It is their problem.

However it is possible that they are not competent to fix the problem. In that case some human agent will have to determine the actual encoding of the file. There is no automated way of doing it.
[ November 06, 2006: Message edited by: Paul Clapham ]
reply
    Bookmark Topic Watch Topic
  • New Topic