This week's book giveaway is in the Java in General forum. We're giving away four copies of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 and have ishori Sharan & Adam L Davis on-line! See this thread for details.
I'm having problems writing large data strings to a file. The scenario is like this: I have (atleast) 200 data files - each one a Gabor jet of information. Each Gabor jet is simply one line - a String of approximately 200000 characters. In order to create the training file for a Support Vector Machine, I need to read each jet in, prepend the training class (a "+1" or "-1") to the jet and write it into the training file.
A very simple algorithm (a while loop with BufferedReaders and BufferedWriters) took over 3 hrs to create the training file - and this is simply not feasible.
File channels are supposedly faster than the "regular" IO classes in a plain copy because the data doesn't have to be loaded into the VM. Furthermore, readers and writers are bad choices for doing large data movements because the data is character encoded on reading and writing (unless, of course, you need to process the character data). Since the only change you are making is putting a single character at the beginning of a string, open a FileOutputStream, write your character then use the file channel as detailed in the link above to copy the source data.
Hi, thanks for your reply. I tested the file channel implementation that you linked-in in your post and it works phenomenally faster than before. But I've been struggling to prepend the +1 and -1 class information; since FileChannels only allow ByteBuffer arguments, do I have to first create a byte representation of +1, copy it into a bytebuffer and then write it? (the code below does not add +1 to the beginning).
I wasn't able to do the add the class using FileOutputStream... but I worked around it by writing it to jet (produced in Matlab) before reading it in.
here's an experiment to show the magic of FileChannels. I'm submitting it because I might have been reading using buffers in an inefficient manner in the first place and would like suggestions if possible.
It's always good to see some code and data. As I indicated in my first post, Readers and Writers are a bad choice for moving data because the VM's going to be applying character encoding on the data as it's loaded and written out. The middle ground of your test would be to use some sort of Stream. Channels should still be fastest. The IO Chapter of Java Platform Performance (book mark that!) has some tests using the java.io classes and various buffering schemes that may interest you. I'd be curious to know what trouble you had with writing data to the FileOutputStream. I'd think it would be as simple as: