Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Optimal I/O operations in Java  RSS feed

 
Brent Boyer
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(Guys: not sure whether to post the following in this Performance forum, or in the one on I/O since it concerns both. If you feel that it belongs there instead, let me know and I will repost it there.
The stuff below comes from something that I recently submitted to a forum at Sun, but I have so far gotten zero useful replies. Can you show that you can do better? Peter Haggar, please read!
Do NOT give me suggestions about file sources, since those are trivial.)
*****
I have a dilemma about how best to perform certain I/O operations in Java.
The issue arises when you have an InputStream which generates bytes at unpredictable times and in unpredictable amounts. For instance, the InputStream could come from a network socket. In my current case of interest, it actually comes from a general native Process that I cannot make any assumptions about. In contrast, an InputStream from a file (that you have an exclusive lock on) would NOT fall into this category because all of the bytes that will ever be read are available right now.
For the above type of InputStream, how do you optimally monitor it for data?
Consider the following Java fragment, which you may be tempted to write:
<code>
InputStream in;
byte[] readBuffer;
.
.
.
while (true) {
int bytesRead = in.read(readBuffer);
if (bytesRead == -1) {
handleStreamEOF();
break;
}
else {
handleNewBytes(readBuffer, bytesRead);
}
}
</code>
The first question I have is what exactly happens when you call the read method on the InputStream? The read javadocs clearly state that the thread executing it will block until at least either a single byte becomes available or the stream hits end of file. Fine -- that is perfectly clear: to the Java programmer, I/O in this case is purely synchronous.
What I really want to know is whether a blocked thread is still inefficiently wasting CPU cycles, or whether the OS/JVM are intelligent enough to recognize that if data is not available, then the thread should be put to sleep and only woken when data is available. In other words, is there any asynchronous I/O going on "under the hood"?
I have reason to believe that, at least with JDK 1.3.1 running on Microsoft OSes (NT and 2K), the blocked thread is NOT put to sleep, but actually continues to waste a lot of CPU cycles. This conclusion comes from profiling a couple of applications where I had many many threads all simultaneously doing, say, read I/O. In these cases, I set it up so that there was always at least one thread that had data available to read, and so should be using the CPU to process that data. Instead, my profiler showed that a majority of the CPU time was spent just on the read method. From that, I am guessing that the blocked threads are still wasting tons of CPU time.
If anyone has any experience on other OSes (e.g. Unixes), I would be very interested in what you found.
OK, for now let's assume that I/O blocked threads are in fact wasting CPU cycles. Also, assume that the asynchronous I/O facilities in Java 1.4 are unavailable -- both because 1.4 is still in beta, and also because, as I near as I can tell, 1.4 only gives extremely limited asynchronous I/O support (e.g. it is limited to sockets, and not to, say, InputStreams from Processes).
So is there any way of being more CPU efficient? The best method that I have come up with is a pseudo-polling solution, whose code might look something like:
<code>
InputStream in;
int pollInterval;
byte[] readBuffer;
.
.
.
while (true) {
int bytesRead = in.read(readBuffer);
if (bytesRead == -1) {
handleStreamEOF();
break;
}
else {
handleNewBytes(readBuffer, bytesRead);
}
if (stream.available() == 0)
Thread.sleep(pollInterval);
}
</code>
This solution sleeps for a fixed period of time if no data is available before re-attempting a blocking read. Of course, it has all the usual defects of a polling I/O solution (e.g. being woken up on a fixed schedule, instead of when data is actually available).
Surprisingly, it is not a true polling solution: you still have to do reads that could indefinitely block. This major defect arises because the -1 value returned by read seems to be the only way to detect stream end of file.
IF YOU KNOW ANOTHER RELIABLE WAY, PLEASE TELL ME!
This is entirely because Sun seems to have lacked the foresight to either supply an isEOF method for InputStream, or to specify in the contract for the available method what exactly happens when you call it on a stream that has reached EOF.
The correct specification would be to say that available returns -1 when the stream has reached EOF. An inferior specification would be to say that it throws an IOException. The actual javadocs say nothing, and, in fact, the implementation of available in InputStream itself is to always return 0 under all circumstances.
Had Sun done what I suggest, you could at least write a true polling I/O solution like
<code>
InputStream in;
int pollInterval;
byte[] readBuffer;
.
.
.
while (true) {
case (stream.available())
-1:
handleStreamEOF();
return;

0:
Thread.sleep(pollInterval);
break;

default:
int bytesRead = in.read(readBuffer);
handleNewBytes(readBuffer, bytesRead);
}
}
</code>

You guys have any feedback and ideas for me?
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!