• Post Reply Bookmark Topic Watch Topic
  • New Topic

Java String has an extra space and the file starts with ÿþ

 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I am facing a problem where I read a file using standard Java file reading mechanism. What I noticed is that in memory the String contains an extra space before each character and it starts with ÿþ.

I tried to replace the extra spaces with some other character but it does not replace. I cannot modify the file.

thanks
attach.png
[Thumbnail for attach.png]
 
Dave Tolls
Ranch Hand
Posts: 2390
25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You probably need to find out what character set the original file uses.
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
it is UTF-8
 
Stephan van Hulst
Bartender
Posts: 6669
90
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's not.

It looks like UTF-16LE. Java has a standard charset that you can use to read this format.
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
You are right it is  UTF-16LE. I did the conversion as per Java standard charset.It works perfectly fine on Windows Server. However on Linux It is not working.

I would really appreciate if any one could assist me in handling this issue that works both on Windows and Linux.

thanks
 
Stephan van Hulst
Bartender
Posts: 6669
90
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ItDoesntWorkIsUseless.

Please show us your code, show us what isn't working, what errors or output you're getting, the binary contents of the input file on both systems, etc.
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 15768
74
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The characters ÿþ that you see in the beginning are a byte order mark.

If you read the file using the UTF-16 encoding, then that should be handled automatically.
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Please find below section of code and output on Windows server and Linux server

<

InputStream inputStream = new ByteArrayInputStream(fileContents.getBytes("ISO-8859-1"));
//InputStream inputStream = new ByteArrayInputStream(fileContents.getBytes("UTF-8"));
byte[] output= decoder.decodeBuffer(inputStream);
fileCotent = new String(output,StandardCharsets.UTF_16);
//fileCotent = new String(output,StandardCharsets.UTF-8);

System.out.printf("File Content:" + fileContent);

>

On Windows output is :

File Content: <P
On Linux output is

"File Content: �� < P

Note that the output starts with �� and has a space after each character


thanks
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 15768
74
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
From these lines of code, it's still very unclear how you are writing and/or reading the file. In fact, this is confusing the question more - are you only reading the file, or are you also writing the file? Why are you using the ISO-8859-1 and/or UTF-8 character encodings when you know that the file is in UTF-16?

Post the real, working code from your program that causes the problem.
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

thanks for your prompt response.

Let me explain my service a bit.

I have a Java method that takes a String parameter. As below

public  String getXMLContentWithEncoding(String fileContents)

I have no control on the encoding of the string . It can be UTF-8 or UTF-16 or anything else. The input string 'fileContents' is also   Base64 encoded.

Using below statement I decode the string

            BASE64Decoder decoder = new BASE64Decoder();
            InputStream inputStream = new ByteArrayInputStream(fileContents.getBytes("ISO-8859-1")); //Here I  also tried UTF-8, UTF-16
            byte[] output = decoder.decodeBuffer(inputStream);


Now I need to generate the string from decoded byte[] output.
Below code is used to generate the string

String file1 = new String(output,StandardCharsets.UTF_16);

Now print file1

System.out.print("File Content:" + file1);

On Windows Server the output is

On Windows output is :

File Content: <Pasd

On Linux output is

File Content: �� < P a s d


It is same code running on both server. Same String is passed to both server but on Linux server output contains �� and a pace after each character of String

thanks



 
Dave Tolls
Ranch Hand
Posts: 2390
25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Where does the String come from?
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

The string comes through a Web Service. Users of Web Service send file content in  Base64 encoded format. The file can be UTF-8, UTF-16 etc.

thanks
 
Stephan van Hulst
Bartender
Posts: 6669
90
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please print the original base64 encoded input strings on both operating systems. I have a feeling the input will be different.
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Please find below the encoded String. It is same on both Windows and Linux


Encoded String on Linux

/v8APABQAHIAaQBvAHIALgBSAGUAcQB1AGUAcwB0AD4APABIAGUAYQBkAGUAcgA+ADwAUwBlAG4AZABlAHIASQBEAD4ATQBGADEANQA4ADkAPAAvAFMAZQBuAGQAZQByAEkARAA+ADwAUgBlAGMAZQBpAHYAZQByAEkARAA+AEUAMAAwADEAPAAvAFIAZQBjAGUAaQB2AGUAcgBJAEQAPgA8AFQAcgBhAG4AcwBhAGMAdABpAG8AbgBEAGEAdABlAD4AMgAxAC8AMQAwAC8AMgAwADEANQAgADEANQA6ADAAMQA8AC8AVAByAGEAbgBzAGEAYwB0AGkAbwBuAEQAYQB0AGUAPgA8AFIAZQBjAG8AcgBkAEMAbwB1AG4AdAA+ADEAPAAvAFIAZQBjAG8AcgBkAEMAbwB1AG4AdAA+ADwARABpAHMAcABvAHMAaQB0AGkAbwBuAEYAbABhAGcAPgBQAFIATwBEAFUAQwBUAEkATwBOADwALwBEAGkAcwBwAG8AcwBpAHQAaQBvAG4ARgBsAGEAZwA+ADwALwBIAGUAYQBkAGUAcgA+ADwAQQB1AHQAaABvAHIAaQB6AGEAdABpAG8AbgA+ADwAVAB5AHAAZQA+AEMAYQBuAGMAZQBsAGwAYQB0AGkAbwBuADwALwBUAHkAcABlAD4APABJAEQAPgBNAEYAMQA1ADgAOQAtADIAOQA3ADEANAA1ADwALwBJAEQAPgA8AEkARABQAGEAeQBlAHIAPgAxADQAOAA5ADUAMwA5ADEAPAAvAEkARABQAGEAeQBlAHIAPgA8AE0AZQBtAGIAZQByAEkARAA+ADAAMgAzADQAMQAxADMAMQA8AC8ATQBlAG0AYgBlAHIASQBEAD4APABQAGEAeQBlAHIASQBEAD4ARQAwADAAMQA8AC8AUABhAHkAZQByAEkARAA+ADwARQBtAGkAcgBhAHQAZQBzAEkARABOAHUAbQBiAGUAcgA+ADcAOAA0AC0AMQA5ADgANgAtADIANQAzADIANwAwADgALQA5ADwALwBFAG0AaQByAGEAdABlAHMASQBEAE4AdQBtAGIAZQByAD4APABEAGEAdABlAE8AcgBkAGUAcgBlAGQAPgAwADcALwAxADAALwAyADAAMQA1ADwALwBEAGEAdABlAE8AcgBkAGUAcgBlAGQAPgA8AC8AQQB1AHQAaABvAHIAaQB6AGEAdABpAG8AbgA+ADwALwBQAHIAaQBvAHIALgBSAGUAcQB1AGUAcwB0AD4=

Encoded String on Windows

/v8APABQAHIAaQBvAHIALgBSAGUAcQB1AGUAcwB0AD4APABIAGUAYQBkAGUAcgA+ADwAUwBlAG4AZABlAHIASQBEAD4ATQBGADEANQA4ADkAPAAvAFMAZQBuAGQAZQByAEkARAA+ADwAUgBlAGMAZQBpAHYAZQByAEkARAA+AEUAMAAwADEAPAAvAFIAZQBjAGUAaQB2AGUAcgBJAEQAPgA8AFQAcgBhAG4AcwBhAGMAdABpAG8AbgBEAGEAdABlAD4AMgAxAC8AMQAwAC8AMgAwADEANQAgADEANQA6ADAAMQA8AC8AVAByAGEAbgBzAGEAYwB0AGkAbwBuAEQAYQB0AGUAPgA8AFIAZQBjAG8AcgBkAEMAbwB1AG4AdAA+ADEAPAAvAFIAZQBjAG8AcgBkAEMAbwB1AG4AdAA+ADwARABpAHMAcABvAHMAaQB0AGkAbwBuAEYAbABhAGcAPgBQAFIATwBEAFUAQwBUAEkATwBOADwALwBEAGkAcwBwAG8AcwBpAHQAaQBvAG4ARgBsAGEAZwA+ADwALwBIAGUAYQBkAGUAcgA+ADwAQQB1AHQAaABvAHIAaQB6AGEAdABpAG8AbgA+ADwAVAB5AHAAZQA+AEMAYQBuAGMAZQBsAGwAYQB0AGkAbwBuADwALwBUAHkAcABlAD4APABJAEQAPgBNAEYAMQA1ADgAOQAtADIAOQA3ADEANAA1ADwALwBJAEQAPgA8AEkARABQAGEAeQBlAHIAPgAxADQAOAA5ADUAMwA5ADEAPAAvAEkARABQAGEAeQBlAHIAPgA8AE0AZQBtAGIAZQByAEkARAA+ADAAMgAzADQAMQAxADMAMQA8AC8ATQBlAG0AYgBlAHIASQBEAD4APABQAGEAeQBlAHIASQBEAD4ARQAwADAAMQA8AC8AUABhAHkAZQByAEkARAA+ADwARQBtAGkAcgBhAHQAZQBzAEkARABOAHUAbQBiAGUAcgA+ADcAOAA0AC0AMQA5ADgANgAtADIANQAzADIANwAwADgALQA5ADwALwBFAG0AaQByAGEAdABlAHMASQBEAE4AdQBtAGIAZQByAD4APABEAGEAdABlAE8AcgBkAGUAcgBlAGQAPgAwADcALwAxADAALwAyADAAMQA1ADwALwBEAGEAdABlAE8AcgBkAGUAcgBlAGQAPgA8AC8AQQB1AHQAaABvAHIAaQB6AGEAdABpAG8AbgA+ADwALwBQAHIAaQBvAHIALgBSAGUAcQB1AGUAcwB0AD4=



 
Stephan van Hulst
Bartender
Posts: 6669
90
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I decode the Base64, the bytes I get can all be interpreted as readable ASCII characters:
The only thing I inserted here were line endings and indentation.

The problem lies in that you're asking a Base64 string to return bytes as if it's an ISO-8859-1 string. You should decode the Base64 BEFORE you interpret the bytes as a string in a different encoding. However, you can also parse it to an XML document directly, and then later use a transformer if you want to write it to disk.
 
Sarwar Baloch
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear All,

Thanks a lot for your support. I have updated my code based on your suggestions and now it is working perfectly.


 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!