• Post Reply Bookmark Topic Watch Topic
  • New Topic

Encoding and Decoding Issues across OS

 
Thanga prakash Somasundaram
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I'm using Java 5. I try to encode a byte array using encoding
functionality in String and decode it back in to byte array from the
encoded string. If no encoding scheme is specified the default
encoding for that platform is taken.

I ran my application on red hat linux 7.2, fedora core 3, Windows
XP. I have listed down my observation in table below:

OS encoding result
linux 7.2 default(ISO-8859-1) OK
linux 7.2 UTF-8/UTF-16/ASCII Not OK
fedora 3 default(UTF-8) Not OK
fedora 3 ISO-8859-1 OK
fedora 3 UTF-16/ASCII Not OK
windows XP default(Cp1252) Not OK
windows XP ISO-8859-1 OK
windows XP UTF-8,UTF-16,ASCII Not OK

ISO-8859-1 is the only encoding scheme that works on all the three
platforms. It should be noted that,the default encoding scheme returned
on fedora core 3 (UTF-8) and windows XP (Cp1252) show ambiguous
behaviour.

The above observations have been made from the sample program I have
listed below. In the program I store all possible values with in a
byte (-128 to +127) in a byte array. I encode this array in to a
string using an encoding scheme of my choice. I decode string using
the same encoding scheme and obtain a byte array.

I compare the values stored in the original byte array with the byte
array obtained after going thru the processof encoding and
decoding. By theory,these two arrays should match if the same scheme
is used to encode and decode and the encoding scheme is supported by
Java. Copy the program and run it on different platforms specifing
different encoding schemes.


//program
//Operator.java

import java.io.*;

public class Operator
{
static byte a[] = new byte[256];
static byte b[] = null;
static String s = null;
static Encoder encoder = null;
static Decoder decoder = null;

//change this to alter encoding/decoding scheme
static String ENCODING_SCHEME = "UTF-8";


//initialise original array
private static void initialise()
{
int index = 0;

for(int i= -128 ; i<128 ; i++)
{
a[index] = (byte) i;
index++;
}
}

//compares original array and the array obtained from
//encoding -->decoding process
//prints differences if there are any
private static void compare()
{
if(a.length == b.length)
{
for(int i=0 ; i<a.length ; i++)
{
if(a[i] != b[i])
{
System.out.println("Data mismatch @ index: " + i);
System.out.println("a[" + i + "] = " + a[i]);
System.out.println("b[" + i + "] = " + b[i]);
}
}
}
else
{
System.out.println("array sizes dont match");
System.out.println("a.length = " + a.length + "\nb.length = " + b.length);
}
}

public static void main(String args[])
{
initialise();

encoder = new Encoder();
decoder = new Decoder();


System.out.println("Encoding used = " + ENCODING_SCHEME);
encoder.encode();
decoder.decode();

System.out.println("comparision started");
compare();
System.out.println("comparision complete");

System.out.println("Default encoding = " + System.getProperty("file.encoding"));
}


//encoder
public static class Encoder
{
public void encode()
{
try
{
s = new String(a,0,a.length,ENCODING_SCHEME);
}
catch(Exception e)
{
System.out.println(e);
}
}
}


//decoder
public static class Decoder
{
public void decode()
{
try
{
b = s.getBytes(ENCODING_SCHEME);
}
catch(Exception e)
{
System.out.println(e);
}
}
}
}


Why does the above observed ambiguity occur?
Is this a bug in Java5?
If so,it is a critical one. Any answer ???

regards,
stp.
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What kind of processor are Fedora and Linux running on? Could this possibly be explained by differint byte ordering schemes being used on different CPUs?
 
Paul Clapham
Sheriff
Posts: 21865
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By theory,these two arrays should match if the same scheme
is used to encode and decode and the encoding scheme is supported by
Java.
This "theory" you refer to assumes that encoding and decoding are both one-to-one mappings. In many of the encodings you name they are not. In particular it's quite common for decoding to map all bytes that aren't specified to the "?" character. So the theory is wrong.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!