Originally posted by rathi ji:
Okay but why even UTF-8???
Why not to use ASCII if the XML file is containing only English characters???
ASCII is a subset of UTF-8. So if your file really contains only unaccented Latin letters, it's going to look identical whether it's encoded in ASCII or UTF-8. (Except for the prolog where you declare the encoding, of course.)
And as soon as somebody uses an accented letter in their data, the code that writes the ASCII version has to know to change it to a Unicode escape in the output. The standard
Java classes do know this, of course, but many people don't use the built-in classes and prefer to write their own code that may not know it.
Basically UTF-8 can represent any character at all, including ASCII characters, and it doesn't cost anything extra to use it for ASCII characters. So it just makes sense to use UTF-8. (Or UTF-16 if your data contains a large percentage of CJK characters.)