• Post Reply Bookmark Topic Watch Topic
  • New Topic

How to parse UTF-8 file with BOM  RSS feed

 
rajesh mohanty
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,
Hope you can help with this. My requirement is that i have to parse a UTF-8 file with BOM characters at beginning and end of the file. Actually it doesn't happen every time sometimes i get UTF-8 file with NO BOM. Could any one of you suggest me a way how to detect this characters.

Thanks in advance!

Rajesh
[ August 12, 2007: Message edited by: Ulf Dittmer ]
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch.

How are you parsing or reading the file? The easiest way to detect the BOM would be to read the first 3 bytes from a file input stream, and then checking that they are EF, BB and BF, respectively. Note that a BOM is only at the beginning of a file, not the end. If you have files where it's at the end, too, then you should work towards receiving correctly created files (i.e., without a BOM at the end).

If you've read the first few bytes and there is no BOM, you could use a PushbackInputStream to push the bytes back into the stream, before handinh the stream to the reading/parsing code.
[ August 12, 2007: Message edited by: Ulf Dittmer ]
 
rajesh mohanty
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Ulf for your reply.
I think this solution will help me

Rajesh
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!