• Post Reply Bookmark Topic Watch Topic
  • New Topic

Extended characters encoding in java  RSS feed

 
Vinutha Harishankar
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have to import a file which contains extended characters into my application.
I need to display the file contents in a jsp file.
While I am imprting the file the character set used is UTF-8.

But still on displaying the contents on a jsp file after having been imported the the extended characters are corrupted.

Can anybody give a java code as to how to achieve it.
I.e display the extended characters without getting corrupted properly on the JSP page.

I have attached the file which I am trying to import them to display.
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Several questions/things to investigate come to mind:
  • Are you certain that the file is in UTF-8 encoding?
  • Between reading the file and streaming it to the browser, are you processing the text in any way?
  • How are you setting the content type of the JSP to UTF-8?
  • Does the browser have a font that includes the missing/corrupted characters?
  • What is the connection to XML (which this forum is about)?
  •  
    Vinutha Harishankar
    Greenhorn
    Posts: 7
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    1) I am sorry the file which I will try to import will be an external one.
    i will not have control over the file.It mite be a txt, csv or an excel file.
    Therefore not sure if that file will be UTF-8 encoding

    What my concern here is to display the imported contents from that file onto a jsp without the extended characters getting corrupted
    2)Between reading the file and streaming it to the browser, the text is been processed as follows
    BufferedReader reader =
    new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
    Hashtable data = parser.parse();
    3) The content of the page to be displayed is set as follows
    <meta content="text/html; charset=UTF-8">
    4) Yes it shows as square boxes or qeastion marks in the browser as follows
    Sw��t��g, Br���
    5) Sorry this has nothing to do with xml.I have posted in the wrong forum

     
    Ulf Dittmer
    Rancher
    Posts: 42972
    73
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Therefore not sure if that file will be UTF-8 encoding

    new FileInputStream(file), "UTF-8"));

    That's the problem right there. You're telling the JVM that it is UTF-8, but at the same time you're not sure if it is. If it's not, the code is in trouble. These days, you can't meaningfully process text if you don't know its encoding. Therefore, you need to contact whoever provides those files and have them tell you which encoding they're in.

    4) Yes it shows as square boxes or qeastion marks in the browser as follows
    Sw��t��g, Br���

    That doesn't prove anything about whether a suitable font is present. What kind of characters are we talking about - some simple accented characters in the 128-256 range that would be present in just about any font? Or something more special like Hindi (or Chinese, Japanese etc.) characters that most fonts don't have?
     
    Vinutha Harishankar
    Greenhorn
    Posts: 7
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    No its the extended characters that would be present in the file I am trying to import
    Following are the fonts which I want to import and try to display on the jsp
    Bríáñ SwéétíÑg
     
    Ulf Dittmer
    Rancher
    Posts: 42972
    73
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Those characters look pretty basic - any machine should have the required fonts installed. So it's most likely really just the issue of using the proper encoding when reading the file.
     
    Vinutha Harishankar
    Greenhorn
    Posts: 7
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi,

    how to detect the encoding of the file with Java?
    I need to import a file which may hava a utf-8, Unicode or ANSI or some any other kind of encoding.
    The code which I am using to read the file as follows
    BufferedReader reader =
    new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
    Since I am using the UTF-8 as the default encoding to read the file, in case the file has some other encodings , I am unable to dispaly the contents properly on my jsp after having been imported.

    I am importing a file which has extended characters and special characters as its content

    Please anybody suggest a code for the same to handle different encodings of the file
     
    Ulf Dittmer
    Rancher
    Posts: 42972
    73
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Try the http://jchardet.sourceforge.net/ library for detecting the encoding of a file. Once you know the encoding, you can open the file with the code you showed, substituting the actual encoding for "UTF-8".
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!