• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to encode French characters to word.document xml file?

 
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am using Word.document xml as my template to load data to it using java, it works fine except some French characters such as é which caused the word file can not be opened, when I open the word file, I got error: "Illegal xml character, Location: line: 3, column: 18765", when I opened the file, I can see the word "André" which "é" in this case caused the problem, so basically, I need to do some encoding work before loading these French character to Word.document xml file. But there are many characters. Is there any simple way to add line of code on the word.document xml header to resolve this problem instead of using for looping to encode those all French characters one by one? thanks
 
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If your document is really an XML document, then there are two things you should do:

1. Declare an encoding in the XML prolog, preferably UTF-8.

2. Write the XML document using that encoding. That either involves telling standard XML classes that you want that encoding, or if you're using your own BufferedWriter, wrap it around an OutputStreamWriter which uses that encoding.

If you find yourself implementing a solution where your code has to look at each character you're writing out, then you went the wrong way. All of this sort of thing is built into Java somewhere, you just have to find out where.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:If your document is really an XML document, then there are two things you should do:

1. Declare an encoding in the XML prolog, preferably UTF-8.

2. Write the XML document using that encoding. That either involves telling standard XML classes that you want that encoding, or if you're using your own BufferedWriter, wrap it around an OutputStreamWriter which uses that encoding.

If you find yourself implementing a solution where your code has to look at each character you're writing out, then you went the wrong way. All of this sort of thing is built into Java somewhere, you just have to find out where.



I am using word.document XML type, basically, I create a word document and it is xml type. Not a real pure XML document.
Actually, my case is looking at each character and writing each one out, basically, if my code find a French character need to convert, then I have convert it. for example, if it is "é", then I need to convert it to "ê" etc. I think I am looking for a wrong way if there is no easy way to add a line of code on header part.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Peter Cong wrote:I am using word.document XML type, basically, I create a word document and it is xml type. Not a real pure XML document.



I'm sorry, I don't understand that. It doesn't seem like a beginner topic to me. But I repeat, if you find yourself having to fiddle with each character then you're doing it wrong. If this "word" thing is Microsoft Word then it should be able to handle standard encodings, so I encourage you to find out how to do that.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Peter Cong wrote:I am using word.document XML type, basically, I create a word document and it is xml type. Not a real pure XML document.



I'm sorry, I don't understand that. It doesn't seem like a beginner topic to me. But I repeat, if you find yourself having to fiddle with each character then you're doing it wrong. If this "word" thing is Microsoft Word then it should be able to handle standard encodings, so I encourage you to find out how to do that.



I donot know why you are confused, basically, my word file is a word xml type, when you create a word document, you can choose xml type, so my word document is a xml type, you can find it when you use notepad++ to see the details of the content. the some French characters need to be encoded in order to display the word properly, I want to find an easy way. that is it.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Peter Cong wrote:I donot know why you are confused, basically, my word file is a word xml type, when you create a word document, you can choose xml type, so my word document is a xml type



I'm confused because I have never heard of a "word xml type". I just created a Word document (this is Microsoft Word we're talking about, right?) a few minutes ago and I didn't get to choose "xml type". You seem to think it's obvious what you're talking about but I've been using Java and Word for many years and I don't know what you're talking about. But then aren't we talking about Java programming here? Are you using some code package which you haven't mentioned? Since you posted in Beginning Java I'm assuming you're writing some simple code without reference to third-party APIs or whatever. So I think it would help if you explained your problem. Perhaps posting some code would help so we aren't completely in the dark.
 
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote: . . . It doesn't seem like a beginner topic to me. . . .

No, it isn't. Let's see if I can't move it … to the wrong place
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Peter Cong wrote:I donot know why you are confused, basically, my word file is a word xml type, when you create a word document, you can choose xml type, so my word document is a xml type



I'm confused because I have never heard of a "word xml type". I just created a Word document (this is Microsoft Word we're talking about, right?) a few minutes ago and I didn't get to choose "xml type". You seem to think it's obvious what you're talking about but I've been using Java and Word for many years and I don't know what you're talking about. But then aren't we talking about Java programming here? Are you using some code package which you haven't mentioned? Since you posted in Beginning Java I'm assuming you're writing some simple code without reference to third-party APIs or whatever. So I think it would help if you explained your problem. Perhaps posting some code would help so we aren't completely in the dark.



If you do know about Word XML document, then you may not know the answer,
If you open word 2007 and write something, then save it, you have to choice in "Save as type", click that dropdown, you should see option of "Word XML Document" or "Word 2003 XML document ", that is how you create a word xml document.
The reason I post my question here, because I am using Java codes to dump data to this word xml document (as template), which caused the problem of encoding for French Characters,
If you think this is not a beginner question, please go ahead to move it to other forum, thanks
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I guessed that was what you were talking about. But that's not a simple format, and you still haven't explained how you're producing it. Surely not by a BufferedWriter or anything simple like that. So since you're having problems with your code, why don't you show us a code sample? Preferably an SSCCE (<-- follow that link and read it).
 
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Peter Cong
As a stop-gap, have you tried to do a re-encoding data (text) from system's encoding or iso-8859-1 to utf-8 that I guess is the encoding anticipated in the docx ?
 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g tsuji wrote:@Peter Cong
As a stop-gap, have you tried to do a re-encoding data (text) from system's encoding or iso-8859-1 to utf-8 that I guess is the encoding anticipated in the docx ?



Very very wrong! This says extract the bytes of the String using the platform default encoding then pretend these bytes are UTF-8 and create a new String from them. If the platform default character encoding is UTF-* then this is a null operation and if not UTF-8 then it may corrupt the string. In Java ALL Strings are UNICODE encoded as UTF16. ALWAYS ALWAYS ALWAYS.

The proper approach is set the correct character encoding of whatever is reading the input in the first place and not trying to frig it once it has been corrupted by being read using the wrong character encoding.
 
g tsuji
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In this concrete case, I don't even know what s actually be represented and the result is said wrongly encoded. That is an attempt to work backward. A try is a try, the system won't explode and it takes 2 minutes to run a trial... and in this case, I would do many other testings first to assure what is the range of the problem and in the process very very many compile time and runtime errors.

If that is fundamentally wrong --- and yes, I know some many situations the mapping would go disarray and warnings etc emitted --- then don't use it even during a quick debugging. Sorry for your time.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g tsuji wrote:In this concrete case, I don't even know what s actually be represented and the result is said wrongly encoded...



Yes, this (for us) is the basic problem. We haven't seen any code so we have no idea what we are dealing with. The guess that it's the input which is being mangled by using the wrong encoding -- yes, that sounds likely to me. But until the OP starts contributing to the dialog we aren't going to get any farther than that.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

g tsuji wrote:In this concrete case, I don't even know what s actually be represented and the result is said wrongly encoded...



Yes, this (for us) is the basic problem. We haven't seen any code so we have no idea what we are dealing with. The guess that it's the input which is being mangled by using the wrong encoding -- yes, that sounds likely to me. But until the OP starts contributing to the dialog we aren't going to get any farther than that.



Thank you for all inputs, sorry for the delay response,here is the codes which will help you understand my issue:

The word.xml template have many field holder like: ##UserName##, so the UserName will be replaced with the data from reader, it works except some French words such as "é" are not converted properly, so I create another method XmlEncode with these codes:

The above codes works, but I do not think it is good way since there are many French characters, so I am looking for some better way to achieve this.

Please let me know if you have any better solution. thanks a lot,
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Peter Cong wrote:The word.xml template have many field holder like: ##UserName##, so the UserName will be replaced with the data from reader, it works except some French words such as "é" are not converted properly, so I create another method XmlEncode with these codes...



But this sounds like it's this "word.xml template" thing which has the problem. Perhaps you should fix that instead. Or perhaps you're writing to it with the wrong encoding -- you could try using UTF-8 as the encoding instead:



Remember that we know absolutely nothing about this template thing, either, so whatever we say about it is also going to be guesswork. It wouldn't hurt for you to ask the people who produced the template what encoding they need.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's also worth pointing out that we have no idea what "not converted properly" means. If nobody has already linked to our FAQ entry TellTheDetails (<-- link) it would be worthwhile for you to read that and then give us a better description of the problem.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:It's also worth pointing out that we have no idea what "not converted properly" means. If nobody has already linked to our FAQ entry TellTheDetails (<-- link) it would be worthwhile for you to read that and then give us a better description of the problem.



If you read my first post,you should see it.
For example: "André" converted to something like this "Andr#&3" when I use notepad++ edit to see it, this error casused that word xml document cannot be opened. but I can see the error by using notepad++ editor .
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your first post doesn't say that. And telling us that you see "something like this" is unhelpful. What would be helpful would be for you to provide a precise and detailed description of the problem. If you didn't read that FAQ I linked to, please do that before posting.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Your first post doesn't say that. And telling us that you see "something like this" is unhelpful. What would be helpful would be for you to provide a precise and detailed description of the problem. If you didn't read that FAQ I linked to, please do that before posting.



For this case, I gave you "something like this" should be enough if you know how to resolve this problem, since different French characters will be converted to different hash codes.
OK, If you really need know the exactly error , I rerun my code, here it is: the word "André" converted to "AndrxE9", in notepad++,
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is there a reason you don't want to provide information about this problem? Remember that you are not in a position to judge what is sufficient information for somebody else to diagnose a problem.

Here's what we know so far:

You write some data to a file. It's processed by some unknown template-handling system. (Or perhaps it isn't -- you have told us nothing about that.) When you load the results into Notepad++ you get data which is not what you expected to see, but neither is it invalid data which would prevent it from being processed successfully as a Word document.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Is there a reason you don't want to provide information about this problem? Remember that you are not in a position to judge what is sufficient information for somebody else to diagnose a problem.

Here's what we know so far:

You write some data to a file. It's processed by some unknown template-handling system. (Or perhaps it isn't -- you have told us nothing about that.) When you load the results into Notepad++ you get data which is not what you expected to see, but neither is it invalid data which would prevent it from being processed successfully as a Word document.


I did provide detailed source in my previous post, if you read it carefully, I think it should be enough information to resolve the problem.
Since you do not know anything about word.xml file, I do not think you can help me, any way, thanks for your response.
It is very simple , Here it is, let me repeat it.
I have create a word xml style file as a template which hold some placeholders like ##UserName##, the UserName will be replaced by my java codes post above. the issue if the French character are not converted properly, I have my codes to resolve it above, but I am looking for a best solution other then my code posted above. combine with my previous post, hope you understand it this time.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Peter Cong wrote:I did provide detailed source in my previous post, if you read it carefully, I think it should be enough information to resolve the problem.



And yet you have not resolved the problem. Which means you're wrong there. However I'm perfectly happy to leave your problem as is.
 
g tsuji
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would check the general settings like:

[1] What is the source file(s) encoded in?
[2] What is the system property "file.encoding"?
[3] Is the -encoding switch be applied for the compiling with javac?
etc... and then
[4] Also you can set up your ht in the code instead of drawing it somewhere you said from db. See how it behaves.

If things agree well, I don't think you have to replace the character by its numeric entity as you do in XmlEncode().

Also, to mention in passing,
[5] docx is not really some text file properly speaking. It is a zip file, is it not?! There are some framework to do the job properly, like docx4j. Have you looked into that?
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g tsuji wrote:I would check the general settings like:

[1] What is the source file(s) encoded in?
[2] What is the system property "file.encoding"?
[3] Is the -encoding switch be applied for the compiling with javac?
etc... and then
[4] Also you can set up your ht in the code instead of drawing it somewhere you said from db. See how it behaves.

If things agree well, I don't think you have to replace the character by its numeric entity as you do in XmlEncode().

Also, to mention in passing,
[5] docx is not really some text file properly speaking. It is a zip file, is it not?! There are some framework to do the job properly, like docx4j. Have you looked into that?


Hi, thanks for your help again, your comments are in the right direction,
Here is the original resource I am using, http://dev-notes.com/code.php?q=10
If you can open this URL, you should know exactly what it is.
Basically, my java web project requires to create some word document letters which have some specific format,some data in the letters will be set as variable and populated from database, so I think this example is very closed for my requirement, that is why I want to use it with minus changes.
It is a word .xml document type which works perfect to populate the data to the word_template.xml template.
If you copy the codes to your local and run it, it works perfect. Then you change some data to "André" which contains French characters as "é", then you can not open that word file due to the é was encoding error, in orde to see the error I mentioned , you have to use notepad++ editor to open it.
docx4j is a third party tool, I do not want to use it if this code work ok.

Hope this resource helps you understand my requirements. thanks again,
 
g tsuji
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here is what you can test yourself.

[a] With editor as you use notepad++ which should support utf-8, save your source code msWordUtils.java in utf-8.
[b] Save the word_template.xml in utf-8 in agreement with the prolog. As it is all ascii, so you do notice anything special or you do not to do anything special.
[c] Compile with the switch -encoding utf8.

[c.1] As there are unchecked operations, the compiler also suggests you put -Xlint unchecked to see the warnings if needed.
[d] In the source code, you have a couple of changes to make
[d.1] You want to test CUSTOMERNAME, so you do this, obviously. Note that the source is in utf-8. If you watch it in hexedit, it shows up C3 A9 to confirm.

[d.2] You do nothing (as you do not need to do anything as word_template.xml is all ascii in this case) on the reader lines. In case, word_template.xml is not ascii but contains geniune utf-8 characteristic text, you set it up in a similar fashion as for the writer below.
[d.3] Set up the write in a slightly more elaborated fashion. Elaborate further if you think appropriate.

[d.4] Then since writer now does not support write.newLine(), you do it in an alternative way. (Make it more economic yourself, not calling System.getProperty() each time.)

[e] Since I claim you do not need XmlEncode(), it is retired.

That's about it.
 
g tsuji
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just a final note.
[f] If you deal with xml as text file, it is not useless to know precisely what freedom an xml document is granted to the author of it to be considered it semantically equivalent. That freedom may not be what a text file normally has. Hence, it is always preferrable of it be treated by a xml parser of some kind. But, in this case, maybe we can get away with it as it would be quite unimaginable to have a presumed place-holder to break up into two lines or more. But that is needed to be built into the rule of authoring a template of the kind.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g tsuji wrote:Here is what you can test yourself.

[a] With editor as you use notepad++ which should support utf-8, save your source code msWordUtils.java in utf-8.
[b] Save the word_template.xml in utf-8 in agreement with the prolog. As it is all ascii, so you do notice anything special or you do not to do anything special.
[c] Compile with the switch -encoding utf8.

[c.1] As there are unchecked operations, the compiler also suggests you put -Xlint unchecked to see the warnings if needed.
[d] In the source code, you have a couple of changes to make
[d.1] You want to test CUSTOMERNAME, so you do this, obviously. Note that the source is in utf-8. If you watch it in hexedit, it shows up C3 A9 to confirm.

[d.2] You do nothing (as you do not need to do anything as word_template.xml is all ascii in this case) on the reader lines. In case, word_template.xml is not ascii but contains geniune utf-8 characteristic text, you set it up in a similar fashion as for the writer below.
[d.3] Set up the write in a slightly more elaborated fashion. Elaborate further if you think appropriate.

[d.4] Then since writer now does not support write.newLine(), you do it in an alternative way. (Make it more economic yourself, not calling System.getProperty() each time.)

[e] Since I claim you do not need XmlEncode(), it is retired.

That's about it.



Thanks a lot for your detailed codes, I will verify it later and let you know the result.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g tsuji wrote:Just a final note.
[f] If you deal with xml as text file, it is not useless to know precisely what freedom an xml document is granted to the author of it to be considered it semantically equivalent. That freedom may not be what a text file normally has. Hence, it is always preferrable of it be treated by a xml parser of some kind. But, in this case, maybe we can get away with it as it would be quite unimaginable to have a presumed place-holder to break up into two lines or more. But that is needed to be built into the rule of authoring a template of the kind.



Actually, this is my first concern when I started to use the approach, that will decide if I should use this sample code or not.
Currently, we are using Lotus notes to process this letter, the Letter is produced from a real word.doc (or .docx) document, NOT a word xml document, it works fine, but our company wants to retire Lotus Notes, so I need to find a way to replace it. Since the main lanaguage used in this big web project is Java, so I am looking for a solution to use Java to resolve it without needing any other platform.
By using this approach and some testing, the letter can be generated, but the word file type is .xml, Not .doc .docx anymore, from the loop and feel, .doc and doc xml are same, but I want to know if there are any impact. For example, user are familar with .doc or .docx document type,and they save their word document as .doc mostly, so after they received this doc xml document, and do some changes, then save as .doc, are there any changes for the contents?.
Or if there are any other impacts.
The benefit of using this example without using doc4j tool is that it is simpler, but if there are any big impact based on my business senearios, then I may not use it. please advise. thanks a lot again.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g tsuji wrote:Here is what you can test yourself.

[a] With editor as you use notepad++ which should support utf-8, save your source code msWordUtils.java in utf-8.
[b] Save the word_template.xml in utf-8 in agreement with the prolog. As it is all ascii, so you do notice anything special or you do not to do anything special.
[c] Compile with the switch -encoding utf8.

[c.1] As there are unchecked operations, the compiler also suggests you put -Xlint unchecked to see the warnings if needed.
[d] In the source code, you have a couple of changes to make
[d.1] You want to test CUSTOMERNAME, so you do this, obviously. Note that the source is in utf-8. If you watch it in hexedit, it shows up C3 A9 to confirm.

[d.2] You do nothing (as you do not need to do anything as word_template.xml is all ascii in this case) on the reader lines. In case, word_template.xml is not ascii but contains geniune utf-8 characteristic text, you set it up in a similar fashion as for the writer below.
[d.3] Set up the write in a slightly more elaborated fashion. Elaborate further if you think appropriate.

[d.4] Then since writer now does not support write.newLine(), you do it in an alternative way. (Make it more economic yourself, not calling System.getProperty() each time.)

[e] Since I claim you do not need XmlEncode(), it is retired.

That's about it.



Hi g tsuji, Sorry for the late response, I was busying for some other projects...
it seems you approach is a correct solution, but I have problem with your codes...

I forgot to mention, my java web project which use this code is using eclipse flatform, with JBoss, seam, jsf and Maven. it is not a simple java project can be saved to other format.
in your post, you mentioned : "[a] With editor as you use notepad++ which should support utf-8, save your source code msWordUtils.java in utf-8.
"

So since I can not resave my java source codes to utf-8 format, I just applied your codes of d.3 and d.4 to my java codes, and compiled it in eclipse, it is not working properly. this is my result:
The data fields with place holder, the "é" or "É" or other fields seem working, but not always working, it is also impact to some static text in ms.xml template, they are converted to other format. For example: "Téléphone" is the static text in template, not a placeholder with ##, it is converted to "Téléphone", for this case, since it is not a place holder, it is not necessary to convert it, I know because I used your code of utf-8, it will convert all text in the ms.xml template , I also have some more other examples: "«Property»" converted to "«Property»)" which is also wrong. also I have page footer as : ".../2", this converted to "…/2". etc.

Using my encode solution posted above works for most of cases, but since I used "&#..." to convert french character, it does not working if there is "&" in the ms.xml template, since for this case, the "&" will be converted to "&", then it will convert all other fields of data with &#..., which makes wrong convertion.


Please let me if there are any other althernation solutions.
Thanks a lot again for your help,
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Peter Cong wrote:... my java web project which use this code is using eclipse flatform... So since I can not resave my java source codes to utf-8 format...



Sure you can tell Eclipse to use UTF-8 for your Java source code. Window -> Preferences -> General -> Workspace : Text file encoding.

I strongly recommend you do that.
 
Peter Cong
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Peter Cong wrote:... my java web project which use this code is using eclipse flatform... So since I can not resave my java source codes to utf-8 format...



Sure you can tell Eclipse to use UTF-8 for your Java source code. Window -> Preferences -> General -> Workspace : Text file encoding.

I strongly recommend you do that.



Thanks for your quick response, this is a big java web application, I am wondering if there is any impact to other existing java codes which created by other developers if I do that.
Any idea if I do that? I do not want to crash other existing java codes.

Thanks
reply
    Bookmark Topic Watch Topic
  • New Topic