Forums Register Login

IO - byte array vs buffer

+Pie Number of slices to send: Send
Hello everyone,
I am having a doubt in my mind for using byte array and BufferInputStream.
I Wrote a code :

The output I got :
File Written Successfuly to TARGETFILE1.TXT  in time  13 milisecond using byte array
File Written Successfuly to TARGETFILE1.TXT  in time  13 milisecond using buffer


Actually it's fluctuating, sometimes byte array takes more time and sometimes buffer takes more time. I used 8192 in byte[] because that is the default buffer size in case of BufferInputStream/BufferOuputStream

Now my question is :
If we can have a buffer using byte[], then when do we need Buffer stream ?
Somewhere I read that byte[] barr = byte[1024] is also a one mb size buffer. Am I correct in this ?
1
+Pie Number of slices to send: Send
 


The output I got :
File Written Successfuly to TARGETFILE1.TXT  in time  13 milisecond using byte array
File Written Successfuly to TARGETFILE1.TXT  in time  13 milisecond using buffer

Actually it's fluctuating, sometimes byte array takes more time and sometimes buffer takes more time. I used 8192 in byte[] because that is the default buffer size in case of BufferInputStream/BufferOuputStream

Now my question is :
If we can have a buffer using byte[], then when do we need Buffer stream ?
Somewhere I read that byte[] barr = byte[1024] is also a one mb size buffer. Am I correct in this ?



How big is your data set? It may not be large enough for a good comparison. 13ms is not very long, and the fluctuations can be due to any number of things which are out of your control. Getting a good measurement is very hard.

In general, buffered I/O is faster than non-buffered, as it reduces the number of actual system I/O calls (less reads and writes to media). There is no call to implement it manually, BufferedReader or BufferedInputStream is quite good enough.

byte[] barr = byte[1024] creates sizeof(byte) * 1024 = 8 * 1024 = 8192 bytes in memory, plus another 4 bytes for the .length member of the array.
+Pie Number of slices to send: Send
Sorry, correction on my previous post:

byte[] barr = byte[1024] creates 1024 bytes + another few bytes for the actual array object itself (not sure exactly how many, probably less than 16 bytes total overhead).

+Pie Number of slices to send: Send
So, no, new byte[1024], is a 1KB buffer, not a 1MB buffer.
+Pie Number of slices to send: Send
I think reading and writing in bytes would be a better option.

Also 1024 bytes = 1 Kb, 1 Mb = 1024 * 1 Kb

As per the description given in Java Docs:
A BufferedInputStream adds functionality to another input stream-namely, the ability to buffer the input and to support the mark and reset methods.
As bytes from the stream are read or skipped, the internal buffer is refilled as necessary from the contained input stream, many bytes at a time.
+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:Now my question is :
If we can have a buffer using byte[], then when do we need Buffer stream ?
Somewhere I read that byte[] barr = byte[1024] is also a one mb size buffer. Am I correct in this ?



If you use a buffered stream, then you don't have to provide your own buffer. That way you avoid errors like the one you made in your code, because the code in the API will have been thoroughly tested.

And as for the meaning of "mb" and so on, I recommend to you the Wikipedia article Megabyte.

You may be wondering about the error you made. When you read a stream one buffer at a time, the last buffer you read may not be completely full. Your code ignores that possibility and writes out the entire byte array even though only part of it should be written out. (You are far from being the first person to make this error.) However if you read one byte at a time you don't encounter this issue.
+Pie Number of slices to send: Send
By the way: there's a constructor for BufferedInputStream which allows you to set the size of the buffer. If you want to find out what buffer size works better (in your environment) then that's the way to do it. And like Damon said, a larger file would be better for that kind of test.
+Pie Number of slices to send: Send
I would say that, in general, a bigger buffer is better because it defers more actual I/O calls. But of course there's a space/time tradeoff there, and there's a point of diminishing returns. I think an 8KB buffer should be quite good. A lot depends on the block transfer size of the device. If you know what this is, then you can set your buffer size based on some multiple of that.
+Pie Number of slices to send: Send
Thanks Damon , Ravi.
Actually "mb" was a typo. Thanks for pointing it out.
my updated code is :


source file size is 4660 KB and number of records 50776

target files:
withoutByteArray() :    target file size 4660 KB and number of records 50776(correct records)
withByteArray() :         target file size 4664 KB and number of records 50824 (why??)
withBufferAndArray() : target file size 4664 KB and number of records 50824 (why??)
withBufferNotArray() :  target file size 4656 KB and number of records 50732 (why??)

Exact number of records in target are only when I am reading and writing byte wise only. Then what is the use of buffer and byte array if data is written incorrectly ?
And what is the reason for this behaviour ?
+Pie Number of slices to send: Send
I actually did that exercise a few years ago at work, where I was working on a part of an application whose main function was to move files from one place to another. What I found was that there was a reasonably good choice of buffer size where if you chose a much smaller or larger buffer then your throughput rate went down. So I used that number in the application and as far as I know, if the application is still running then it still uses that number.

However it made a difference whether the file was being moved locally, or over our LAN, or over our WAN, or to a customer's FTP site, and so on. Since the code I was optimizing didn't know anything about that, I chose what I thought was the commonest case and optimized that case. And for all I knew there might be a better choice on the machines the code actually ran on -- there's only so much optimizing you can do.
+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:Exact number of records in target are only when I am reading and writing byte wise only. Then what is the use of buffer and byte array if data is written incorrectly ?
And what is the reason for this behaviour ?



I thought I already mentioned that error in my first post in this thread?
+Pie Number of slices to send: Send
 

Paul Clapham wrote:If you use a buffered stream, then you don't have to provide your own buffer. That way you avoid errors like the one you made in your code, because the code in the API will have been thoroughly tested.
You may be wondering about the error you made. When you read a stream one buffer at a time, the last buffer you read may not be completely full. Your code ignores that possibility and writes out the entire byte array even though only part of it should be written out. (You are far from being the first person to make this error.) However if you read one byte at a time you don't encounter this issue.



In this you said that we should avoid using our own buffer, instead we should use BufferOutputStream . But in my case BufferOutputStream  is also writing wrong data. why such ?
+Pie Number of slices to send: Send
The error is here, I think:



If there are less than 8192 bytes in the file, only a portion of the array is filled. The remaining portion is essentially uninitialized. Then you go ahead and write the entire byte array, including those uninitialized elements. Have a look at the Javadoc for InputStream read() method. I'm gonna go ahead and quote it:



From the doc: "Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer."

How, the doc for write method:



From the doc: "Writes b.length bytes from the specified byte array to this output stream."

You need to create a variable to store the number of bytes read when you call the read() method, then pass that variable into the write() method:



"Writes len bytes from the specified byte array starting at offset off to this output stream. "
+Pie Number of slices to send: Send
In your case, seems like the offset should always be 0, and you can define a "len" variable, returned from read(), which you then pass to write(barr, 0, len).

Hope that helps.
+Pie Number of slices to send: Send
Or you can avoid the problem by simply using BufferedInputStream instead of manually allocating and handling your byte[].
+Pie Number of slices to send: Send
Thanks Damon,
As I already mentioned, by using BufferedInputStream also I am facing similar problem. Please see my last reply.
I can send you .txt file which I am using if you can share your email.
+Pie Number of slices to send: Send
No, I don't see any problem with your implementation using BufferedInputStream. There must be something else going on that's affecting it.
+Pie Number of slices to send: Send
Maybe try just writing a copyFile() method, using the BufferedInputStream and BufferedOutputStreams.
+Pie Number of slices to send: Send
Maybe you didn't flush your buffered data.

Simple example:



If I remove the flush(), b.txt is always a 0 bytes file. Otherwise it works. I think maybe this is the issue here.
+Pie Number of slices to send: Send
When using buffered output streams, no data is actually written until the buffer is full, or you have an explict call to flush(). What does flush() do? It writes the buffered data onto the device.
+Pie Number of slices to send: Send
However the close() method of BufferedWriter automatically calls flush(). Failing to close() your writer is also an error.
+Pie Number of slices to send: Send
Indeed. I tried to post the simplest code that demonstrates the issue...
1
+Pie Number of slices to send: Send
Testing showed that close() is not automatically called, so either an explicit call to flush() or close() is mandatory in order to actually write the buffered data from a into b.
+Pie Number of slices to send: Send
 

Damon McNeill wrote:If I remove the flush(), b.txt is always a 0 bytes file. Otherwise it works. I think maybe this is the issue here.


That's not the case with me.

when I don't flush, the data goes to the target file, but again something incorrect data as before.
But when I uncomment the flush, target file created is exactly same as source file. So yes flush helped me here.

I corrected my code and got correct target file :

Since I am using try with resource, no use to call flush, as close() is called automatically and hence the flush().
Thanks for this.

Few last doubts in my mind :
1. why byte[] are so fast ? Took only 13-19 millisecond(though some incorrect data) but buffer took 170-190 millisecond ?
2. If I put bufferOutput.flush() on each iteration, the code become so slow and write file in 32574 millisecond ?

My understanding :Buffered input streams read data from  buffer, the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full. Even though there is no need to flush on each iteration, if we still do it that adds some extra overhead at each iteration and hence slowing down the code execution .
Please correct me wherever I am wrong.
1
+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:2. If I put bufferOutput.flush() on each iteration, the code become so slow and write file in 32574 millisecond ?



That causes the contents of the buffer (i.e. one byte) to be written to disk immediately. If you don't flush then the buffer only gets written to disk when it becomes full. Since writing data to disk takes a very long time compared to doing calculations (you've got a metal thing spinning around), that's why. However if your "disk" were an SSD (solid state drive) then there isn't a metal thing spinning around, it's almost like writing to memory, so the difference between your two latest scenarios would be much smaller.
+Pie Number of slices to send: Send
 

My understanding :Buffered input streams read data from  buffer, the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full. Even though there is no need to flush on each iteration, if we still do it that adds some extra overhead at each iteration and hence slowing down the code execution .



Yes, that's correct. However, there's no need to flush() on EVERY iteration, just the final one. Basically, flush() is called internally by write() when then buffer is full. If it's not full, no flush is done.
+Pie Number of slices to send: Send
+Pie Number of slices to send: Send
I don't know about this version of "try" syntax that's you're using.. But I posted an example of a simplistic "copy" program, maybe try running that on your test input, and see if that works. If it does, then the problem is somewhere else in your code?
+Pie Number of slices to send: Send
There's no need to call the flush as the try-with-resources (look it up if you haven't encountered it before) closes any Closeable resource declared in the try.  Since it's closed, that means it's also been flushed.
+Pie Number of slices to send: Send
Thanks a lot Damon & Paul.
Doubts are cleared, only one left as I asked in my previous reply.
why byte[] are so fast ? Took only 13-19 millisecond(though some incorrect data) but buffer took 170-190 millisecond ?
My understanding : As when we define our own buffer using byte[], that doesn't care whether it is full or not and keeps inserting/exporting from itself. But BufferedOutput stream's buffer need to be full/empty for further steps and hence it is time consuming, and safe too.
Please let me know if I am correct or not.

@Damon : try() is the try with resource . The object initialised inside it's paranthesis need not to be close(). They automatically gets closed. But one restriction is there, you can declare only those objects of those classes which implements Closeable.

+Pie Number of slices to send: Send
You might need to do some tests around whether it is the read or the write (or both).
For both of them, though, the bye[] version uses the native arraycopy method, which I would bet is faster than filling the byte[] inside the BufferedIn/OuStream one byte at a time.
1
+Pie Number of slices to send: Send
The thing is... BufferedInputStream uses an array internally and calls the same read(byte[] b, int off, int len) method of its wrapped stream as you would do if you wrote the correct algorithm without using BufferedInputStream. Even IF it's slower, that's probably only to perform certain bookkeeping that makes it easy to perform other stream operations on it. It's added exactly so you don't have to work with arrays anymore. If you want to perform I/O, you either use InputStream/OutputStream, or if you want to interact with the buffer, you use a ReadableByteChannel/WritableByteChannel.

Here's a proper file copying implementation that uses an array:

The only cure for that is hours of television radiation. And this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 2515 times.
Similar Threads
downloading images?
I am getting junk, when trying file copy
setLastModified( long ) method returns false, why ?
Is NIO really faster than java.io.*
Reading source file & writing to target file Vs copying source file to target file?
Download remote jar file using java
FileOutputStream flush position of channel
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 28, 2024 03:46:24.