• Post Reply Bookmark Topic Watch Topic
  • New Topic

Read huge file in java  RSS feed

 
S Majumder
Ranch Hand
Posts: 349
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
How to read 1 gb file by java ?
Thanks,
Satya
 
Paul Clapham
Sheriff
Posts: 22185
38
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One byte at a time.

Seriously, just read the file and process it as you read it. There's no reason that the size of the file should cause you any problems unless you try to store all of it in memory.
 
Rob Spoor
Sheriff
Posts: 20904
81
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:One byte at a time.

I hope that's a joke. Unless you're using a BufferedInputStream, reading one byte at a time is going to be quite slow. It's better to use one of the read methods that takes a byte[]. The size of the byte[] needs to be tweaked a bit to get optimum results, but I usually take 4096 or 8192 (roughly equivalent to 4 or 8 KB).
 
Paul Clapham
Sheriff
Posts: 22185
38
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:
Paul Clapham wrote:One byte at a time.

I hope that's a joke.


Which is why I said "Seriously..." after it.
 
S Majumder
Ranch Hand
Posts: 349
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul and Rob Thanks for the reply ..
Could you guys give some example code .

Satya
 
Gopi Chella
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As Rob told you can use BufferedInputStreem to read huge size file, if you search google you can get large number of example on this. However i can suggest you read large files using perl which will be faster than java.
 
S Majumder
Ranch Hand
Posts: 349
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hay Gopi ,
Thanks will try ....
Satya
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gopi Chella wrote: However i can suggest you read large files using perl which will be faster than java.


I wouldn't be so sure (note the benchmarks on that page do not include large-file processing, but Perl is much slower than Java for the benchmarks tested).
 
S Majumder
Ranch Hand
Posts: 349
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Joe,
Give some examples for Java,
thanks,
Satya
 
Jaikiran Pai
Sheriff
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
S Majumder wrote:
Give some examples for Java,


Did you search for examples? You did not find any?
 
S Majumder
Ranch Hand
Posts: 349
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jaikiran,
Not so lucky
 
Henry Wong
author
Sheriff
Posts: 22866
119
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
S Majumder wrote:Hi Jaikiran,
Not so lucky


As already mentioned, with streams, there really isn't much difference with processing large versus small files (API wise, not performance obvious).... how about the Oracle tutorial on IO??

http://docs.oracle.com/javase/tutorial/essential/io/


Lot of examples there.

Henry
 
Nakataa Kokuyo
Ranch Hand
Posts: 189
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey Guru,

So the best choice of file class to be used for read huge file is BufferedInputStreem? I'm thinking to store the content into memory to ease data processing for reporting purposes or charting, what could be the best choice?

Thanks in advance!
 
Paul Clapham
Sheriff
Posts: 22185
38
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nakataa Kokuyo wrote:So the best choice of file class to be used for read huge file is BufferedInputStreem?


As already mentioned (several times now), the size of the file is irrelevant to the class you would choose to read a file.

I'm thinking to store the content into memory to ease data processing for reporting purposes or charting, what could be the best choice?


Your best choice is to choose to have enough memory to store that data. As for how you should read the data to get it into memory, that's a separate issue which I believe we have covered quite thoroughly by now.
 
Mohamed Sanaulla
Bartender
Posts: 3169
34
Google App Engine Java Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
S Majumder wrote:Hi All,
How to read 1 gb file by java ?

How about reading the file Asynchronously and not get blocked until the complete file is read? You can explore the AsynchronousFileChannel to see how the file can be read asynchronously.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:One byte at a time.

I'm pretty sure that this is the punch line to an ancient joke about "how do you eat an elephant"?

Not that long ago, you would have to eat a 1gb file in chunks of some size, since typical computers had 16 MB or some similar amount of RAM. The last machine that I built, has 32GB of ram, so in theory, you could just read a 1GB file into memory. I don't recommend this, normally you read in some suitable amount, process it, read the next, repeat until done. But these days, the theory becomes possible.
 
Rishi Shah
Ranch Hand
Posts: 43
Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:
Paul Clapham wrote:One byte at a time.

I hope that's a joke. Unless you're using a BufferedInputStream, reading one byte at a time is going to be quite slow. It's better to use one of the read methods that takes a byte[]. The size of the byte[] needs to be tweaked a bit to get optimum results, but I usually take 4096 or 8192 (roughly equivalent to 4 or 8 KB).


That's still reading in one byte at at time. You would have to use NIO to read in blocks at a time.
 
Rob Spoor
Sheriff
Posts: 20904
81
Chrome Eclipse IDE Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I call read with a byte[], the read may definitely try to read a block of bytes, not one at a time. It depends on the InputStream implementation of course, but FileInputStream's read() and read(byte[], int, int) methods are both native, which indicates that the second probably won't be reading the file one byte at a time. No doubt that a Socket's InputStream will behave the same.

Of course it's possible that reading a byte[] will simply call read() multiple times, but it doesn't have to be the case.
 
Rishi Shah
Ranch Hand
Posts: 43
Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:If I call read with a byte[], the read may definitely try to read a block of bytes, not one at a time. It depends on the InputStream implementation of course, but FileInputStream's read() and read(byte[], int, int) methods are both native, which indicates that the second probably won't be reading the file one byte at a time. No doubt that a Socket's InputStream will behave the same.

Of course it's possible that reading a byte[] will simply call read() multiple times, but it doesn't have to be the case.


No, FileInputStream reads one byte at a time (which is why NIO should be used in this case). Having said that, it's intentional and not a bad thing, as this form of reading is useful and convenient in certain situations.
 
Rob Spoor
Sheriff
Posts: 20904
81
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you show me the source of your statement? Because I've checked the non-native code, and nowhere can I see that read(byte[]) calls read().
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have the feeling different folks here mean different things by "read one byte at a time" anyway. Surely read(byte[]) is more efficient than calling read() multiple times; that's one important point. (Though this effect is greatly lessened if we insert a BufferedInputStream.) And also NIO allows some techniques that may well be more performant than the read(byte[]), and are perhaps more likely to result in actual concurrent reads of different bytes simultaneously. Though I agree with Rob that FIS could in principle do this too, and I doubt any of us is familiar enough with all existing implementations of FIS to say this never happens.

But do these distinctions really matter, given that Paul's "one byte at a time" was essentially a joke anyway? The basic point was that, for most applications, you can read a huge file much the same way you would read a small file. Don't freak out just because it's big; just do what you would normally do. As long as what you would normally do does not include any attempt to keep the whole contents of the file in memory at once. That would be bad.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!