• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Unable be to read Special character (OS specific)

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am using XMLReader and IPTCEntry classes to read xml file and images, then storing the required data into the database. The data is being extracted properly without any issue in windows, but surprisingly on fedora core and centOS and not able to read the special characters.

The copyright symbol i.e. '�' is extracted as '?' by centOS and '���' by fedora core 4.

The code is fine because I am deploying the same war on all the different machines.
Java is a platform independent language so it should fetch the same symbol irrespective of the operating system.

along with the � double quotes(as copied from MS word) '�' is replaced by '?' in centOS.

Is it because of some setting issue where in centOS is unable to read those characters or a java bug.

Any help is most welcome
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Welcome to JavaRanch!

There are a couple of places where things could be falling down, but none of them are Java bugs. First, an XML file should specify an encoding in the XML header which states how the characters are represented in the file. Make sure that encoding matches the actual content of the XML.

Second, note that even if Java has read the proper character, when you try to print out your special characters, the terminal or GUI window may not know how to represent them. Again, this can either be an encoding issue -- the terminal's default encoding doesn't contain representations for the characters -- or something more primitive, like a terminal that only displays ASCII.

You might test your Linux terminal's capabilities by just using "cat" or "less" to display the XML file itself, looking for those special characters. Alternatively, you could capture the Java output in a file, and edit that file with an editor that lets you see the actual bytes in the file, checking to see that in fact Java is emitting the right ones.
 
Ashutosh Devbrat
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Earnest,

Thanks for the reply.
I have already tried a those things. My Xml files are UTF-8 encoded.
Apart from the the character encoding of my xml reader is also UTF-8.
I am trying to print the output only for trial purpose. In the application,
the data once extracted directly goes into the database(mysql). The database is capable of storing/displaying special character.

Even after all these things the issue remains.

even if I try to print any special character using a simple System.out.println("�"); It appears as '?' on the linux machine.
Now the question remains whether catalina.out(tomcat logger) has the ability to display special character or not. I tried to open it in different editors , even tried to change the encoding schemes. but the output remained '?'.

So all I can conclude from this java was not able to extract/read the � when hosted in CcentOS environment, which is surprizing considering java is platform independent.

I hope I make my point clear.

Once again, Thanks for showing interest.

Thnaks
 
Ernest Friedman-Hill
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No, not really; all you've shown is that your terminal doesn't know how to display the copyright character. Since you're on Linux, you have the 'od' program ("octal dump") which can show you the actual bytes in a file. Process your XML so that the "bad" character goes from Java directly to a file, being sure that you open the file with an encoding that can handle the character (or that your platform default can handle it.) Then examine the file with a command like

od -t x1 | less

This will show your file as pages of hexadecimal bytes; you can look for the proper bytes for the copyright character. It would help if the output file is as short as possible, of course! The od output looks like



Each number on the right is one byte from the file, as a hexadecimal number.
 
Slime does not pay. Always keep your tiny ad dry.
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic