• Post Reply Bookmark Topic Watch Topic
  • New Topic

How does finding the end of a file with -1 work?  RSS feed

 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi everyone, I'm learning java for my electrical engineering course and I've been going through java tutorials. Now one thing that confuses the hell out of me is I/O streams, I didn't get them in VB when I was learning to programme and I don't now.

My question is why does the following piece of code work?

Specifically these lines



The file we're reading from only contains letters and spaces, no numbers, so firstly how does it manage to assign the data to C? And secondly why != -1? there isn't a minus -1 in the file? And when I try adding a -1 mid way through the text document it carriers on writing past it. How in god's name does this work?

 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you read the docs for InputStream.read(), you'll see that it returns the byte that was read, or else a -1 if there are no more bytes to read.

So internally, it's logic will look something like


No great mystery.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrey Boubriak wrote:And when I try adding a -1 mid way through the text document


But you didn't add a -1. You added one or more bytes, each of which was in the range 0..255.
 
Saman Kumara
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In a computer, all letters spaces numbers represent as numbers called ascii.

That is because the return value of read() method in inputstream is int.

For example, for 'A' returns 65 and for 'B' returns 66 and so on...

ASCII code for space is 32.
 
Tony Docherty
Bartender
Posts: 3268
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch Saman.

In a computer, all letters spaces numbers represent as numbers called ascii.

Java uses Unicode not ASCII. However the 128 ASCII characters map directly to the first 128 Unicode characters so, for example, the number 65 is the letter A in both the ASCII and Unicode encoding systems
 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeff Verdegan wrote:If you read the docs for InputStream.read(), you'll see that it returns the byte that was read, or else a -1 if there are no more bytes to read.

So internally, it's logic will look something like


No great mystery.


I thought the byte FileInputStream reader read bytes, not the character stream FileReader? What's the difference then? Also doesn't that mean that if you try and print c to the console you'll get the unicode integer related to that character rather than the letter you actually want?
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrey Boubriak wrote:I thought the byte FileInputStream reader read bytes, not the character stream FileReader? What's the difference then?

It is hard to tell what you mean by that, but relative to the read() method returning -1: there is no meaningful difference between an InputStream and a Reader in this case.

The InputStream reads byte by byte, and returns the byte (range of values from 0 to 255) as an int value. If the end of the stream is reached, -1 (an int value) is returned. You never have to worry about the -1 return coming from the data because an int value of -1 is well beyond the possible ranges of the byte actually stored in the data - that is the data can never ever have an int value of -1 (it could have a signed byte value of -1, but that is not the same -1 when its bits are stored in an int).

The Reader reads char by char, and returns the char (range of values from 0 to 65535) as an int value. If the end of the stream is reached, -1 (an int value) is returned. You never have to worry about the -1 return coming from the data because an int value of -1 is beyond the possible ranges of the char actually stored in the data. That data can never have an int value of -1 (it might have "-1" stored in it, but that would be two separate chars '-' followed by '1', which is not the same thing as an in value of -1. Also, since chars are essentially unsigned shorts, you could never have a char whose value was -1.)
 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay I think I understand the difference, but how would you print the text in a file to the console if the both streams return integer values? Surely you don't have to write a method that converts the integer value to the equivalent unicode character?

Likewise if you want input from a user, how do you convert that (the word "orange" for example) into an integer number to be written? Also what if you want have the number 65 and you want to write that to a text file? If you use an int variable it will then misinterpret that as the character "a" so does that mean you have to home how convert the number 65 into two separate variables one contained the unicode integer for the character "6" and another for the character "5"? That seems ridiculously complicated for something like writing to a file that most useful programmes actually do.
 
Tony Docherty
Bartender
Posts: 3268
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
but how would you print the text in a file to the console if the both streams return integer values? Surely you don't have to write a method that converts the integer value to the equivalent unicode character?

These are low level classes, you generally don't read/write to them directly. When dealing with character streams you should use FileReader/FileWriters and wrap them in one of the higher level streams such as BuferredReader which has a readline() method to read a whole line of data at a time and PrintWriter which has methods for writing primitive types and Strings.

Likewise if you want input from a user, how do you convert that (the word "orange" for example) into an integer number to be written?

You don't, the writer does that for you. Although it actually converts the characters to bytes and not integers.

Also what if you want have the number 65 and you want to write that to a text file? If you use an int variable it will then misinterpret that as the character "a" ...

No, it doesn't work like that. If you are writing the number 65 to a text file then it is converted to a String and then it is 2 separate characters ie a 6 and a 5.
 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tony Docherty wrote:
You don't, the writer does that for you. Although it actually converts the characters to bytes and not integers.


Do you mind giving me the correct syntax for writing some string to a file then? In the code above the .write function is used with an int and it ends up writing the binary number which represents an unicode character, how does it know if it should write what the number in the variable represents or the actual number itself? Does it behave different when given different types of variables? ie if you give it an int it will write the binary of that number and then when read it will read whatever that number means in unicode. Whilst giving it a string with a number it will write the unicode of the characters that represent that number?

Edit:

I want to know for example I want to write a programme that will take a file then convert all the letters to lowercase letters using .toLowerCase, then save it in a different file. How should I get the text out of the file in a string format?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrey Boubriak wrote:How should I get the text out of the file in a string format?


BufferedReader.readLine() is one way.
 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay I vaguely have what I want going on... Vaguely.... I can read the text into a string variable, that's fine, however writing the string flops massively. I read the following

"In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea. -2 "


and it outputs fine to the console, however when written into the second text file I get this...

In਍堀愀渀愀搀甀ഀ
did਍䬀甀戀氀愀ഀ
Khan਍ഀ
A਍猀琀愀琀攀氀礀ഀ
pleasure-dome਍搀攀挀爀攀攀㨀ഀ
਍圀栀攀爀攀ഀ
Alph,਍琀栀攀ഀ
sacred਍爀椀瘀攀爀Ⰰഀ
ran਍ഀ
Through਍挀愀瘀攀爀渀猀ഀ
measureless਍琀漀ഀ
man਍ഀ
Down਍琀漀ഀ
a਍猀甀渀氀攀猀猀ഀ
sea.਍ⴀ㈀ഀ


Help please :'( ! I will give my firstborn to the person that can tell me why the way I'm using printwriter is doing that.

 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My guess is that what you're reading is not a text file, but a Word doc or something--something that contains more than just text--and your console is just ignore the junk output, but your PrintWriter/FileWriter combo is trying to convert things that aren't characters into characters. Either that or some kind of weird encoding inconsistency.

How are you creating the original file?

How are you reading the one that appears messed up that your program produces?
 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeff Verdegan wrote:My guess is that what you're reading is not a text file, but a Word doc or something--something that contains more than just text--and your console is just ignore the junk output, but your PrintWriter/FileWriter combo is trying to convert things that aren't characters into characters. Either that or some kind of weird encoding inconsistency.

How are you creating the original file?

How are you reading the one that appears messed up that your program produces?


I copied text of a website pasted it into notepade saved it as a .txt file making sure encoding was unicode and I'm reading it just by opening it in notepad. Could there be magical voodoo that I copied of the website?
 
Paul Clapham
Sheriff
Posts: 22505
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrey Boubriak wrote:

I copied text of a website pasted it into notepade saved it as a .txt file making sure encoding was unicode


I'm not quite sure what Notepad means by "Unicode" but at any rate if you saved it using some particular charset, you should then read it in your program using that same charset. Which you didn't. Your code doesn't specify any charset at all, so it uses the system's default charset. Since you're using Windows that would be ISO-8859-1; for ordinary English text, then, it would be a good idea for you to save it from Notepad using a similar charset. I don't use Notepad (and really neither should you) but when I opened it up to see what you should be doing, I don't see anywhere which allows you to specify the charset when you save a file. So I have no idea what to tell you to do instead of what you did, because I have no idea how you did that.
 
Tony Docherty
Bartender
Posts: 3268
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:I don't use Notepad (and really neither should you) but when I opened it up to see what you should be doing, I don't see anywhere which allows you to specify the charset when you save a file.

In the file save dialog there is an 'Encoding' drop down just below the 'File type' drop down which allows you to select from a few options. The default is ANSI.

@Andrey Boubriak: I suggest you save the file using the default Notepad encoding of ANSI and try to run your program again.
 
Andrey Boubriak
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tony Docherty wrote:
Paul Clapham wrote:I don't use Notepad (and really neither should you) but when I opened it up to see what you should be doing, I don't see anywhere which allows you to specify the charset when you save a file.

In the file save dialog there is an 'Encoding' drop down just below the 'File type' drop down which allows you to select from a few options. The default is ANSI.

@Andrey Boubriak: I suggest you save the file using the default Notepad encoding of ANSI and try to run your program again.


Thank you that fixed it
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!