• Post Reply Bookmark Topic Watch Topic
  • New Topic

Byte -127 Question

 
Brian Mozhdehi
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am struggling to get a String representation of a 4 byte array that represents an Integer (written from DataOutputStream). It seems that whenever one of the bytes results in a -127 value (HEX 0X81), when I convert the byte array to a String using new String(byte[]), a subsequent call to String.getBytes() converts the value -127 to 63.

I can share the complete code if needed, but I found I can reproduce this with a snippet as below

************************
byte[] bb = new byte[1];
bb[0] = -127;

String aString = new String(bb);

byte[] bb2 = aString.getBytes();
**************************

Inspecting the bb2[0] byte shows a 63 instead of -127.

Kind of an obscrue question I am assuming, but anyone have any ideas on how to solve this?

Thanks in advance for any assistance, much appreciated
 
Ireneusz Kordal
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
String(byte[]) constructor and getBytes() function try to map byte value into unicode character using default (platform) charset.
Look at api first: http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#String(byte[])
for String(bytes[]) they wrote:
The behavior of this constructor when the given bytes are not valid in the default charset is unspecified

for getBytes() they wrote:
The behavior of this method when this string cannot be encoded in the default charset is unspecified.


Then run the code below and check what is default charset on your JVM:


On my computer (Windows XP code page 1252) default charset is UTF-8.
If you look at this encoding (http://en.wikipedia.org/wiki/UTF-8), you will see that "byte" codes 128-193, 245-255 are invalid,
only codes from range 0-127 are "green" (allowed), the rest have special meaning.
Code 0x81 in this charset means "start of 2-byte sequence", so a single byte 0x81 is invalid too.

If some byte value has no valid representation in your charset, then these functions cannot map
this byte to unicode and will give strange results


This code gives you better results:
>
 
Brian Mozhdehi
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your help, very much appreciated. That makes sense.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!