Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML and UTF-8

 
Anand Gondhiya
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

1. following is the text from XML file which is supposed to be read using UTF-8. So I brought it up in IE to view. I tried to change the encoding to UTF-8 so that I can see bulllets instead of the weird characters like •. Am I doing the right thing / right way ?

It is expected that the incumbent meets the following selection criteria:

• A postgraduate degree, preferably Ph.D, in a relevant field such as economics, trade, competitiveness, industrial organization, private sector development. A multi-disciplinary background is an advantage.

• At least 12 years (15 with Master’s degree) relevant experience in trade and competitiveness.


2. Above text is part of CDATA section. As mentiioned above, with UTF-8 format these weird characters actually represent bullets. My java code reads this and copies the CDATA section with <!CDATA[ word and writes it to the output XML file. When I write to the output file , I convert the strings to UTF-8 expecting that the output file will show bulllets as it's already converted into UTF-8.

can anyone comment what am I doing wrong here ? I don't see bullets when I bring up the input file or output file.

Thanks
-Anand>
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Which software and which font are you using to view the XML? Does the software understand Unicode, and does it use UTF-8 when opening the file, and does the font include the character you're missing?

If you look at the output file with a hex editor, does it still have the correct character codes?
 
Anand Gondhiya
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf,

I figured out that If I open the input.xml in Firefox , I can see the bullets. If I open the file in IE , IE won't let me change the encoding but with view source , I can see bullets.
in short , now I know the input file is correct and I can see them correctly.

Also , I wrote following code in Java to convert the text of input.xml to output.xml



If I open the output.xml in firefox , it DOESN"T show the bullets. If I open it in IE and do view source , it doesn't show bullets there as well. So my challenge is to convert the text into UTF-8 format using java in correct way.

this is really getting interesting. let me know if you have inputs. Thanks for your post !!

- Anand
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You don't need all those calls to getBytes - just open the FileOutputStream with an encoding of "UTF-8" and the conversion will happen automatically.

Make sure you're also specifying UTF-8 as the encoding in the XML file.
 
Anand Gondhiya
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I wrote following code but it still is showing the special characters. any more ideas ?




- Anand
 
moe Mans
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

How to add to an xml document instead of over writing it:

Any help will be much appreciated...thanks in advance.

I have managed to get my code to write an xml file with data from input fields from a jsp page... Now I actually need to add new entered details on the jsp page to the existing xml file instead of rewriting it everytime My sample code which currently rewrite the xml file is as follow bellow:



 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic