• Post Reply Bookmark Topic Watch Topic
  • New Topic

byte array to string strips carriage returns  RSS feed

 
Doug McComber
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm using Primefaces to upload a file (plain text) which I then want to assign to a string. I can get this working using either getContents() or getInputStream(), but all whitespace (except actual spaces) are stripped from the input file. For example (assume the gap in the string is a tab):



produces the output:


Why and how and I preserve carriage returns, etc.? Oh, and I have tried various encodings all with the same result.

Thanks,
Doug
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry but the tabs and new lines are not stripped. Try

you should note that your code shows you have a fundamental misunderstanding. In Java characters are always UNICODE encoded as UTF16 code points and your line "byte[] bytes = test.getBytes()" turns the UTF16 encode characters into bytes using the default character encoding for you system which may or may not be UTF8. You then turn these possibly not UTF8 bytes back into a String by pretending they are UTF8 bytes !!!
 
Doug McComber
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Richard Tookey wrote:Sorry but the tabs and new lines are not stripped. Try

You should note that your code shows you have a fundamental misunderstanding. In Java characters are always UNICODE encoded as UTF16 code points and your line "byte[] bytes = test.getBytes()" turns the UTF16 encode characters into bytes using the default character encoding for you system which may or may not be UTF8. You then turn these possibly not UTF8 bytes back into a String by pretending they are UTF8 bytes !!!


I do understand Java chars are UTF16. The sample code is just that. The code I am writing handles file uploads via Primefaces. The files being uploaded are plain text. Whether they are created in Notepad in Windows 7 or vi in Linux does not matter. Their tabs and carriage returns are stripped. In Notepad you can explicitly save a text file as UTF-8 so I know that I need to convert the bytes to string with UTF-8 encoding. Makes no difference.



The above is the same as the previous sample code in that a byte array is converted to a string. And the results are the same, tabs and carriage returns are stripped. You can't expect me to accept that the solution is to edit the text files being uploaded and replace tabs and returns with \t and \n?
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Doug McComber wrote:Their tabs and carriage returns are stripped.


They are not being stripped by the Java code you posted so it any problem must happen elsewhere.


And the results are the same, tabs and carriage returns are stripped. You can't expect me to accept that the solution is to edit the text files being uploaded and replace tabs and returns with \t and \n?


I don't expect you to replace anything. I do expect you to look for the problem elsewhere because it is most definitely not in the flawed Java you posted.
 
Paul Clapham
Sheriff
Posts: 22841
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, running your test code on my machine doesn't produce the output you said it did. Possibly it's your console which makes a tab look like a space in that context, but I wouldn't spend too much time on that theory. I would be looking at your actual code and tracking down the problem instead.

And if you're going to muck about changing Strings to byte arrays and back to Strings, you should really take Richard's advice and use the same encoding for both conversions. Failure to do that can lead to unexpected mangling of data -- but not mangling of tabs, which are treated the same in pretty much every encoding you're ever going to encounter. Like this:

 
Doug McComber
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Um, I'm not "mucking about". The test code is just for demonstration purposes, I am not byte encoding a string just to decode it. I said in my original post that in my actual project, the string comes from an uploaded text file using Primefaces. When are uploading a UTF-8 text file and use the following in the backing bean:



then the value in the "results" variable will not have any of the carriage returns preserved. The "test" code I posted is just to demo this without people having to set up a JSF project with Primefaces (as file.getContents() returns a byte array). Perhaps you two "experts" should read the entire post before jumping on someone in the Beginning Java forum.

That being said, this behavior (stripping of carriage returns, tabs and any spaces being a single one) is on Win7 machines. I'll test in a Linux and Mac environment when I can to see if there is something going on there, if my test code works fine for you as you say it does.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem has to be in either UploadedFile file = event.getFile(); or in the file.getContents(). I know nothing about Primefaces but I have spent enough time converting String to byte and back again (as part of encryption) to be certain that the flaw you are experiencing is not in the String constructor. In your position I would post this in a Primefaces forum.
 
Ivan Jozsef Balazs
Rancher
Posts: 999
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


At the above line you assume that the byte buffer contains a string in the UTF-8 encoding.
Are you sure it is a valid byte stream as UTF-8 encoding?

 
Doug McComber
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So I've had the chance to try this out on a Linux workstation. No change in results when using my "test" code. I created a test input file from the console with vi and using the file command saw that it was encoded as us-ascii. I plugged that in in place of utf-8 and again no change in the results.

However, I had one of those lightbulb moments with regards to the output in my actual code. As it is a web app (JSF) and <hutputText...> is displaying the results I realized that it would need <pre></pre> tags surrounding it. Voila, success! Really one of those "d'oh" moments.

Thanks,
Doug
 
Ivan Jozsef Balazs
Rancher
Posts: 999
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So your problem was not realizing that HTML ignores white spaces among the text?

 
Doug McComber
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ivan Jozsef Balazs wrote:So your problem was not realizing that HTML ignores white spaces among the text?


No, of course not. The problem was that I assumed JSF's <h:outputText> did not ignore whitespace.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!