This week's book giveaway is in the Java in General forum.
We're giving away four copies of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 and have ishori Sharan & Adam L Davis on-line!
See this thread for details.
Win a copy of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

writing large amounts of data to file

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I'm having problems writing large data strings to a file. The scenario is like this: I have (atleast) 200 data files - each one a Gabor jet of information. Each Gabor jet is simply one line - a String of approximately 200000 characters. In order to create the training file for a Support Vector Machine, I need to read each jet in, prepend the training class (a "+1" or "-1") to the jet and write it into the training file.

A very simple algorithm (a while loop with BufferedReaders and BufferedWriters) took over 3 hrs to create the training file - and this is simply not feasible.

I'd appreciate any tips at all

Munaf
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
File channels are supposedly faster than the "regular" IO classes in a plain copy because the data doesn't have to be loaded into the VM. Furthermore, readers and writers are bad choices for doing large data movements because the data is character encoded on reading and writing (unless, of course, you need to process the character data).
Since the only change you are making is putting a single character at the beginning of a string, open a FileOutputStream, write your character then use the file channel as detailed in the link above to copy the source data.
 
Munaf Sheikh
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, thanks for your reply.
I tested the file channel implementation that you linked-in in your post and it works phenomenally faster than before. But I've been struggling to prepend the +1 and -1 class information; since FileChannels only allow ByteBuffer arguments, do I have to first create a byte representation of +1, copy it into a bytebuffer and then write it? (the code below does not add +1 to the beginning).

I'd appreciate any help or pointers

 
Joe Ess
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

it works phenomenally faster than before


That's good to hear. I've never used it, just kept it in mind should I need a quick copy.

Don't use the FileChannel to write your "+1". Use the FileOutputStream.
[ June 25, 2008: Message edited by: Joe Ess ]
 
Munaf Sheikh
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I wasn't able to do the add the class using FileOutputStream... but I worked around it by writing it to jet (produced in Matlab) before reading it in.

here's an experiment to show the magic of FileChannels. I'm submitting it because I might have been reading using buffers in an inefficient manner in the first place and would like suggestions if possible.


the output is commented in the code


Munaf
 
Joe Ess
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's always good to see some code and data.
As I indicated in my first post, Readers and Writers are a bad choice for moving data because the VM's going to be applying character encoding on the data as it's loaded and written out. The middle ground of your test would be to use some sort of Stream. Channels should still be fastest.
The IO Chapter of Java Platform Performance (book mark that!) has some tests using the java.io classes and various buffering schemes that may interest you.
I'd be curious to know what trouble you had with writing data to the FileOutputStream. I'd think it would be as simple as:
reply
    Bookmark Topic Watch Topic
  • New Topic