Win a copy of Functional Reactive Programming this week in the Other Languages forum!

# Uploaded files have characters changed !!!

Derek Clarkson
Greenhorn
Posts: 25
Hi all, I'm not sure if this is the right forum for this. I'm trying to upload a file containing unicode text to a Tomcat server. On my local system (windows XP, Tomcat 4.1.29) it works just file. But when I intall on a test server (Linux, tomcat 4.1.19) some of the characters in the uploads are changed. See below for an example. Can anyone explain this ??? It's got me stumped ;-(

Local Windows XP, Tomcat 4.1.29, from tomcat log. (correct)
Cleaner.showUnicode[194]: B unicode = \u0042
Cleaner.showUnicode[194]: a unicode = \u0061
Cleaner.showUnicode[194]: r unicode = \u0072
Cleaner.showUnicode[194]: r unicode = \u0072
Cleaner.showUnicode[194]: i unicode = \u0069
Cleaner.showUnicode[194]: unicode = \u0020
Cleaner.showUnicode[194]: G unicode = \u0047
Cleaner.showUnicode[194]: � unicode = \u00f2
Cleaner.showUnicode[194]: t unicode = \u0074
Cleaner.showUnicode[194]: i unicode = \u0069
Cleaner.showUnicode[194]: c unicode = \u0063
Cleaner.cleanText[170]: text = Barri G�tic

Test sever linux, Apache, Tomcat 4.1.29, from tomcat log.
Cleaner.showUnicode[194]: B unicode = \u0042
Cleaner.showUnicode[194]: a unicode = \u0061
Cleaner.showUnicode[194]: r unicode = \u0072
Cleaner.showUnicode[194]: r unicode = \u0072
Cleaner.showUnicode[194]: i unicode = \u0069
Cleaner.showUnicode[194]: unicode = \u0020
Cleaner.showUnicode[194]: G unicode = \u0047
Cleaner.showUnicode[194]: ��� unicode = \ufffd
Cleaner.showUnicode[194]: t unicode = \u0074
Cleaner.showUnicode[194]: i unicode = \u0069
Cleaner.showUnicode[194]: c unicode = \u0063
Cleaner.cleanText[170]: text = Barri G���tic

Ulf Dittmer
Rancher
Posts: 42968
73
F2 is the correct encoding for the character in question only when using ISO-8859-1 encoding. Apparently, that's the encoding your local Tomcat runs with. The Linux server may use something else.
1) Check what the server runs with (can also be set in the shell scripts that are used to start Tomcat), and adjust it if it's not ISO-8859-1 (UTF-8 would also be OK, I think, because the first 256 characters are identical to ISO-8859-1).
2) Make sure all JSP pages are generated with the same proper encoding (don't know offhand what the default is).
3) If the characters go anywhere outside Tomcat on the server (file system, database), those steps need to use the same encoding as well.