Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Help with reading text containing non-ascii character

 
Sara Ku
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys,

I need some help with reading text containing non-ascii characters from an excel file.

For example, I want to be able to detect the "©" symbol in the below text and convert it into the unicode escape sequence \u00A9 and store this to the database. The end consumer of this text is a Web browser, so this conversion is needed.

Copyright © 2005-2009

I have been trying different ideas to get this working, but I always end up with the an unreadable character for the symbol.

Thanks in advance!

Sara

 
Paul Clapham
Sheriff
Posts: 21318
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to read this article: Character Conversions from Browser to Database.
 
Sara Ku
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, the article was a good read. But my question was more on the lines of the parameters to set (if any) while using I/O streams in Java to read the file. Right now my focus is more on the reading, storing, retrieving correctly part.
 
Paul Clapham
Sheriff
Posts: 21318
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sara Ku wrote:But my question was more on the lines of the parameters to set (if any) while using I/O streams in Java to read the file.


I don't understand what you mean by "parameters". And I thought you were reading an Excel document? Perhaps you could explain how you're doing that right now.

Also...

Sara Ku wrote:I want to be able to detect the "©" symbol in the below text and convert it into the unicode escape sequence \u00A9 and store this to the database.


Don't do that. Java understands Unicode. Your database understands Unicode. Your web application understands Unicode and so do the browsers that use it. So just use Unicode characters as is. Converting them to something else is going to be wasteful and error-prone. Converting them to Java source code Unicode escapes is especially so.
 
Sara Ku
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You are right, I was not very clear. I took a second look at my code and see that there is no issue with reading from the file. Something gets messed in the process of storing in database and retrieving it. I will look deeper.

Thanks for the help.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic