Win a copy of Terraform in Action this week in the Cloud forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

validating a byte array for some encoding

 
Ranch Hand
Posts: 341
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all
I have a byte array that may be converted to a String with some specified encoding, like so:
String encodedChars = new String(bytes, encoding);
If the specified encoding is not supported, this throws an exception. If however there are invalid characters in the byte array, they are simply dropped from the String result - I wish I could get an exception.
How can I check that all characters in the byte array are valid for the specified encoding?
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do you know how long (how many characters) the resulting String should be? That would be easy to check. Other than that, I have no clue.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You need the java.nio.charsets package in JDK 1.4+:

Unfortunately the CharacterCodingException doesn't seem to include correct info about the position at which the error occurred - I keep getting "Input lenght = 1" even when the error isn't at the beginning of the string. I suppose you could loop through and decode each byte individually, to learn where the errors really are. But that's inelegant considering we're using nio, which is supposed to support bulk operations. Also it would be more complex if our target encoding were a variable-length encoding like UTF-8 rather than US-ASCII, since we don't know in advance how many bytes are required to make up a single char.
 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic