1. & < and > have special XML meaning
2. some characters such as MS Word "smart punctuation" are illegal Unicode
3. some control characters such as ctrl-z are illegal Unicode
1. The org.w3c.dom.Document and other standard XML classes take care of this. The users of the classes don't have to concern themselves with escaping ampersands and less-thans.
2. Not true. You'll find those characters between U+2018 and U+201F. If you have them in your data correctly then the standard XML classes will handle them correctly. What is true is that people often paste those characters into text files without regard to the proper encoding of those files, or hand-generate XML which doesn't declare its encoding properly.
3. This one is true. I don't know what happens if you pass a String containing one of those characters into a DOM text node.