• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Frits Walraven
Bartenders:
  • Carey Brown
  • salvin francis
  • Claude Moore

How can I check whether the file is corrupted for decoding ?  RSS feed

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what if the bytes that cause corruption to decode are in the file ? does InputStream give any error when it finds corrupted bytes for decoding ? for example if 0x82 byte is in UTF-8 encoded file and InputStreamReader tries to read and convert it to character, does InputStreamReader give a decoding error ? Because I don't see the corresponding character of 0x82 byte in UTF-8 Character Table

Thanks.

 
Sheriff
Posts: 24380
55
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Indeed it doesn't throw an exception. And you are correct that it is not valid UTF-8. But your code omits the part where you inspect the value returned by inReader.read(). This value is 65533, or in hexadecimal 0xfffd. This is the Unicode "replacement character", which is "used to replace a character whose value is unknown or unrepresentable in unicode".
 
Saloon Keeper
Posts: 10136
214
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What?

U+0082 is a valid unicode character. It's a control character that says that line breaks may occur at the position of the character.

You may want to try with 0xfe instead.
 
Leonardo Nash
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Paul Clapham wrote:Indeed it doesn't throw an exception. And you are correct that it is not valid UTF-8. But your code omits the part where you inspect the value returned by inReader.read(). This value is 65533, or in hexadecimal 0xfffd. This is the Unicode "replacement character", which is "used to replace a character whose value is unknown or unrepresentable in unicode".



Thanks.
 
Paul Clapham
Sheriff
Posts: 24380
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:U+0082 is a valid unicode character. It's a control character that says that line breaks may occur at the position of the character.



Indeed it is. But it's represented as xC282 in UTF-8, not as x82.
 
Stephan van Hulst
Saloon Keeper
Posts: 10136
214
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Huh, I don't know how that one slipped by me. Must have been a cloudy day.
 
roses are red, violets are blue. Some poems rhyme and some are a tiny ad:
Create Edit Print & Convert PDF Using Free API with Java
https://coderanch.com/wiki/703735/Create-Convert-PDF-Free-Spire
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!