• Post Reply Bookmark Topic Watch Topic
  • New Topic

? about how JAVA handles internal data  RSS feed

 
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello!

I'm wondering how JAVA handles internal file data, in particular, how much RAM will be required to process very large files!
If i input a file called "args[0]", see below, with 1 million records, will JAVA create a FileInputStream object containing 1 million records and once that file is created then send that FileInputStream file to the DataInputStream file and if so, would that mean that there are now 2 million records in RAM?
Or will JAVA create a buffer of FileInputStream records and once that buffer is full, then send that buffer to the DataInputStream file and so on?

public class SX3 {
public static void main(String[] args) throws Exception {
MyDefaultHandler dh = new MyDefaultHandler();
RHYWords rhy = new RHYWords();
SX1 sx = new SX1();
SAXParser sp = sx.getParser(args[0]);
sp.parse(rhy.reverse(new DataInputStream(new FileInputStream(args[1]))),dh);

Thanks very much for your help!

bc
 
Ranch Hand
Posts: 385
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is such a thing as virtual memory. You can specify to Java machine how much memory it can allocate. More you give - less swapping. But you shouldn't get any errors unless your virtual memory is full, i.e. usually that means you out of harddrive space.

Regarding streams... Streams are not files in memory. They load data on demand. If nobody reads stream, then stream does not read file. In your case you want to parse file. If parser tries to read data into some char[] array before actually parsing it, then yeah the Java will try to read the whole file. If parser reads blocks, parse, repeat, then stream will provide exactly what's asked - blocks (better add buffered stream to speed up process).

Basicly as far as I read about SAX it is best thing to work with huge XMLs and I would highly expect that it does not read all stream at once.
 
Vladas Razas
Ranch Hand
Posts: 385
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to stress this again: use Buffered streams and readers. They read some block and then give it away. Even if data consumer tries read byte by byte. If you don't use buffered thing, then this may result in multiple read operations greatly reducing application performace.
 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Vladas!

Appreciate your advise and will now be taking a good look at the Buffered approaches, it sounds just like what i'm looking for!

Have a nice weekend Vladas!

bc
 
Vladas Razas
Ranch Hand
Posts: 385
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, you too!
 
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes on the use of Buffered streams/readers; they will minimize the number of disk accesses to improve performance, as Vladas said.

But I wanted to clarify another point. The use of streams to read files allows the file to be read byte-by-byte, if necessary. It's how you use the stream that determines your memory usage.

If you create a byte array as large as the file and then read the whole file into that array in a loop, you'll obviously use RAM equal to the size of the file (plus a little more for the other objects created during the process).

If you use an API like SAX which was specifically designed to parse streams incrementally (small chunks of the file are read, parsed and thrown out), you won't need as much. Keep in mind, though, that if your SAX callbacks are building a full DOM tree, that tree is going to take a lot more space than the size of the file since it's broken up into tiny objects that each add an amount of overhead on par with the data they encapsulate. The idea of SAX is to only save what you need from the parsing, otherwise just use the DOM API and save yourself some effort.

Finally, the use of a buffer, for example 2k in size, will only hold at most 2k of the file in RAM at any one point in time. The buffer is filled with data from the file as you call read(). Once a read() call empties the buffer, it is filled again and the process repeats until you reach the end of the file or stop.

Buffering improves performance by combining your many small read calls (1, 2, 10, 200 bytes at a time, etc) into fewer, larger read calls to the file system. If you access the disk (or even the disk cache) for every byte (no buffer), you will suffer a lot of overhead since reading 1 byte of a file takes nearly the same amount of time as reading 2k bytes. The same applies to writing, too.

Finally, note that JDK 1.4 introduced a completely new paradigm for I/O: buffers and channels. You don't need to use it (and can't if you're not using JDK 1.4 or higher), but it is very cool and improves performance yet again. It's still a good idea to get comfortable with the original I/O classes, though, since they'll be used for a long time.
[ January 14, 2005: Message edited by: David Harkness ]
 
Vladas Razas
Ranch Hand
Posts: 385
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I go read about channels.
 
Vladas Razas
Ranch Hand
Posts: 385
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Comment on SAX: if you need to keep whole data in memory then JDOM or other things alike is much better choice. It will do all dirty job. I wouldn't recommend original JDK DOM, because you'll have to deal with small things as whitespace handling etc. Currently I use JDK DOM, but only because it's bundled in standard JDK and I have strict space requirements... I do applet
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just went through this IBM tutorial that was pretty good. Here are a couple other links i saved from my Google search:
  • Master Merlin's new I/O classes
  • Merlin brings nonblocking I/O to the Java platform
  • Turning Streams Inside Out
  • Sun: New I/O APIs
  • Good luck! It's great stuff and looks much easier to use than sterams.
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!