• Post Reply Bookmark Topic Watch Topic
  • New Topic

Problems Reading UTF-8 File  RSS feed

 
Mercurio Savedra
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys I need clarify this issue

I am working with the SunFtpClient Class in a project that involve
download file contents from Ftp server on Unix Machine. I create some
Files in notepad, write the content and then I save as UTF-8 encoding.

Next I transfer the file content from my machine to the ftp server
in binary mode. Here everything is Ok. But the problem is right here
I execute this piece of code and kaboom the problem appears. Let�s review
the code and next I specify the problem



The Message.txt Content is the following
One Two Three Four Five Six

The LocalMessage.txt Content is the following
?One Two Three Four Five Six

SomeBody Could Ask What is the problem?

The problem is that although I use UTF-8 in InputStreamreader as the Convert
Encoding ,the BOM bytes are not filtering and I suppose that the ? character in the content of file LocalMessage.txt is the result of those bytes. Why InputStreamReader converter=new InputStreamReader(ftp.get("Message.txt"),"UTF-8"); is not working well

I appreciate your comments


















 
Paul Clapham
Sheriff
Posts: 22374
42
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You are correct. When Notepad writes a file in UTF-8 encoding, it puts the BOM (byte order mark) at the beginning of the file. This is unnecessary since byte ordering is unambiguous in an 8-bit encoding, but it does it anyway. So the BOM is there.

You would think that a Java Reader that is decoding from UTF-8 would notice that there's a BOM at the beginning of the file, since the UTF-8 specification says it may be there. But no, it doesn't. So it's up to you to read that byte (or character) and ignore it.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!