• Post Reply Bookmark Topic Watch Topic
  • New Topic

writing large amounts of data to file  RSS feed

 
Munaf Sheikh
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm having problems writing large data strings to a file. The scenario is like this: I have (atleast) 200 data files - each one a Gabor jet of information. Each Gabor jet is simply one line - a String of approximately 200000 characters. In order to create the training file for a Support Vector Machine, I need to read each jet in, prepend the training class (a "+1" or "-1") to the jet and write it into the training file.

A very simple algorithm (a while loop with BufferedReaders and BufferedWriters) took over 3 hrs to create the training file - and this is simply not feasible.

I'd appreciate any tips at all

Munaf
 
Joe Ess
Bartender
Posts: 9429
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
File channels are supposedly faster than the "regular" IO classes in a plain copy because the data doesn't have to be loaded into the VM. Furthermore, readers and writers are bad choices for doing large data movements because the data is character encoded on reading and writing (unless, of course, you need to process the character data).
Since the only change you are making is putting a single character at the beginning of a string, open a FileOutputStream, write your character then use the file channel as detailed in the link above to copy the source data.
 
Munaf Sheikh
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, thanks for your reply.
I tested the file channel implementation that you linked-in in your post and it works phenomenally faster than before. But I've been struggling to prepend the +1 and -1 class information; since FileChannels only allow ByteBuffer arguments, do I have to first create a byte representation of +1, copy it into a bytebuffer and then write it? (the code below does not add +1 to the beginning).

I'd appreciate any help or pointers

 
Joe Ess
Bartender
Posts: 9429
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
it works phenomenally faster than before

That's good to hear. I've never used it, just kept it in mind should I need a quick copy.

Don't use the FileChannel to write your "+1". Use the FileOutputStream.
[ June 25, 2008: Message edited by: Joe Ess ]
 
Munaf Sheikh
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I wasn't able to do the add the class using FileOutputStream... but I worked around it by writing it to jet (produced in Matlab) before reading it in.

here's an experiment to show the magic of FileChannels. I'm submitting it because I might have been reading using buffers in an inefficient manner in the first place and would like suggestions if possible.


the output is commented in the code


Munaf
 
Joe Ess
Bartender
Posts: 9429
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's always good to see some code and data.
As I indicated in my first post, Readers and Writers are a bad choice for moving data because the VM's going to be applying character encoding on the data as it's loaded and written out. The middle ground of your test would be to use some sort of Stream. Channels should still be fastest.
The IO Chapter of Java Platform Performance (book mark that!) has some tests using the java.io classes and various buffering schemes that may interest you.
I'd be curious to know what trouble you had with writing data to the FileOutputStream. I'd think it would be as simple as:
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!