• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

IO Streams and Characters.

 
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

My question is , when we read a file using read() method of InputStream class like

InputStream in = new FileInputStream("a.txt");

int b;
while((b=in.read() != -1){
system.out.println((char)b)
}

the read method returns signed bytes whose values vary from 0 to 255.
So, we can read a text file containing only ASCII and extended ASCII using streams?(If I don't want to use FileReader)
What happens if a read a file containing some unicode characters as I mentioned above?

Also, I read that characters take two bytes of storage. Is it like characters in the range of ASCII take only 1 byte and above those values take 2 bytes?

Thanks,
Dinakar.
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Dinakar Kas wrote:(If I don't want to use FileReader)


Why not? That's what it's for -- to read text files instead of binary files.

What happens if a read a file containing some unicode characters as I mentioned above?


Why don't you try it out?

Also, I read that characters take two bytes of storage. Is it like characters in the range of ASCII take only 1 byte and above those values take 2 bytes?


Character encoding. All characters use two bytes (well, more accurately, 16 bits); when converted to bytes using character encoding characters can require any size from 1 to more bytes. The ASCII encoding only supports characters from 0-127 (inclusive), and only takes one byte per character. Unicode always takes two bytes per character.
 
Dinakar Kas
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Rob,
Thanks for your response. I still have some questions.

I created a .txt file with ANSI encoding. I read the file and written to disk again. It works fine when I used streams.
Now, I changed the encoding of the file to Unicode and have read and written the file to disk, again using streams. It did not work. It showed some gibberish. One thing that surprises me is that when I read file and written to disk using filereader and writer, it still shows some nonsense stuff.

My program is as follows:



Any inputs are highly appreciated.
Thanks.
Dinakar.
 
Rob Spoor
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First of all, I think you have one misunderstanding about FileReader. To be honest, this class is quite rubbish. It won't read the real encoding of the file, but assumes the system default is used. A better solution is use InputStreamReader around a FileInputStream and manually specify the encoding.
I know that I mentioned using FileReader before, and it's still good if the system default encoding is used, but only then.

There is one huge flaw in your code.
That reading is as it should; the writing isn't though. First of all, the third parameter is the number of characters to write. Even if you need to write the entire array, that parameter should be cbuf.length, not cbuf.length - 1. This way you're missing one character most of the time.

That said, you should never assume that you'll need to write the entire array. Although it's probably true for files for most iterations, it's wrong most of the time for the last iteration. Your file size will most likely not be a multiple of 20. If your file size is 32 your code will first write 19 characters (ignoring number 20), then write another 19, where only 12 should be written.

That's where b comes into play. It's the number of characters actually read into cbuf. Therefore, that's also the number of characters to write:
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you use streams, then the actual bytes aren't altered - you can read and write them without knowing how the characters in the file are encoded.

But if you use Readers/Writers, you need to tell the JVM which encoding to use every single time. If you don't, then it's going to assume the platform default encoding, which most of the time is not what you want, and -just as importantly- generally is not UTF-8.

So the problem is that you're specifying UTF-8 during writing, but you're not specifying it during reading. That means using a FileInputStream and an InputStreamReader instead of a FileReader.

Edit: ... which is pretty much what Rob just said. Too late :-(
 
Dinakar Kas
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Rob and Ulf for explaining me where I was going wrong.

I have written a program using InputStreamReader, and it works well.



Thanks once again for the inputs.
Dinakar
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic