Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Accessing a File  RSS feed

 
alan partridge
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using dictionary file consisting of thousands of lines, each line containing a word. In my program I want to randomly access just one of these words, I am hoping that someone could give me the most effecient solution because at the moment I BufferedReader.readLine(); the whole lot into an ArrayList then randomly select from that and then ditch the object.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmmmm... I assume that these lines are variable-length, and you don't know in advance how many lines there are. This would seem to make it necessary to read the who file through once in order to get a line count at tleast - and either store some stuff in memory as you go, or be willing to re-read from the beginning to count up to a certain randomly-generated line number. Neither seems particularly fast or elegant.
What about this: use File.length() to get the number of bytes in the file. Generate a random number in this range. Open a FileInputStream and use skip() to get to the desired offset. Open a BufferedReader wrapping an InputStreamReader wrapping the FileInputStream, and read a line twice. The first line read is just to move to a line boundary, since the randomly-generated offset has most likely put you in the middle of a line. The second line read will be a normal line. If either readLine() comes back null, then go back tothe beginning of the file instead. Close all streams when you're done.
This is about the fastest, lowest-memory-overhead method I can think of for this. The only problem is, the probability of selecting a given word is approximately proportional to the lenght of that word. If that's not acceptable, you'll have to try something else. How often will this procedure be performed on a given file? If it's more than once, it's probably worthwhile to store some info about the file, to facilitate subsequent accesses. Line count is of course highly useful - also, perhaps a sort of limited index which stores the offsets of every 10th word, so that to find line 738 you look up the position of line 730, then read 8 lines forward (as opposed to reading all 738 lines). Naturally "every 10th word" could be any other number - you would probably want to make that configurable, to optimize it later.
 
alan partridge
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Jim,
I used the first of your ideas and am happy with the way it works
private String getRandomWord(){

File file = new File("words.txt");
String str = null;
long ran = 0;

try{
if(!file.exists()){
System.out.println("The words.txt File does not Exist\n"+
"it must exist in the same directory as this program");
System.exit(1);
}else{
long randomRange = file.length();
System.out.println(randomRange);

do{
ran = ranGen.nextLong() % randomRange;
ran = (ran < 0)? -ran: ran;
FileInputStream fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
fis.skip(ran);
br.readLine();
str = br.readLine();
fis.close();
br.close();
}while(str == null);

}
}catch(IOException io){
io.printStackTrace();
}
return str;
}
alan
[ February 14, 2002: Message edited by: alan partridge ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!