• Post Reply Bookmark Topic Watch Topic
  • New Topic

platform independent newline stripping

 
Cuneyt Taskiran
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am processing closed caption text obtained from a decoder connected to a Linux box. My goal is to strip newline characters, among other things. Being a Java newbie (coming from C++) I wrote the code below, which does not work properly on Windows.

For the input
41 4E 44 0A 20 48 45 41 52 44

I get the output

read: 65
A
read: 78
N
read: 68
D
read: 13
read: 10
space
read: 32

read: 72
H
read: 69
E
read: 65
A
read: 82
R
read: 68
D

There is an extra carriage return (code 13) thrown in!
My questions are:
* At what point is the new line inserted, since I do not see it with the hex viewer (PsPad).
* How can I remove these newlines in a platform independent way?

Thanks,
Cuneyt

 
Rob Spoor
Sheriff
Posts: 20820
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Character 13 is carriage return (\r); see http://www.asciitable.com/.

You must understand how line breaks work in the different operating systems.

Unix and Linux use only line feed (\n, 10).
Mac use(d?) only carriage return (\r, 13).
Windows uses carriage return followed by line feed (\r\n, 13 + 10).

Now you are reading in Linux which does not use the \r of Windows text files for its line breaks. As such, those remain. You can see this when opening a Windows text file in VI - all lines end with ^M (the representation of carriage return).

So what you will need to do:
1) run your files through dos2unix or a similar tool first to replace \r\n with \n
2) if you encounter a \r, and ignore it if the next character is a \n


Of course, don't forget to check for the bounds when checking s.charAt(i + 1)
 
Cuneyt Taskiran
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That was helpful, thanks a lot!

C
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!