Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Fortune: Best way to access them?

 
Antonio Rodriguez
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm learning some advanced Java by trying to recreate a fortune-like program. I plan to have a GUI interface (somewhat like the freeware SmartBee). My main dilemma is lanning how to access a fortune quickly from a set of files. One solution I've encountered is creating an index file that rides alongside the fortune file, but I was hoping that there would be a more elegant solution that doesn't involve that.

The fortunes are just text files, with "%" and a carriage return used as separators. The idea is to choose randomly from all the fortunes contained in that file, or potentially choose a random fortune from a list of files. The main dilemma is that the number of fortunes is very large. One file I'm using has more than 110,000 different fortunes, so reading them all into some sort of collection isn't feasible.

Any help is deeply appreciated. A pointer to where I can find a good solution would be excellent.
[ June 24, 2004: Message edited by: Antonio Rodriguez ]
 
David Moose
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I suggest download some free java database, load all the data into that database and use jdbc to select a fortune from that.
 
Tim West
Ranch Hand
Posts: 539
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or, grab the source of the fortune that comes on most Linux distros and see how that does it. Although it's probably in C, the general algorithm may well be translatable.



--Tim
 
Darin Niard
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't see why there would be a problem with using a collection. 110,000 strings isn't that much. If you are worried about lookup speed, use a Hashtable/Map.
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Darin Niard:
110,000 strings isn't that much.


At a 100 characters or so each, two bytes per character, plus say a dozen bytes of overhead per object, that's about 23 megabytes. Funny how the definition of "not that much" has changed!
 
Darin Niard
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
:roll: Then what would you do?
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, the classic "fortune" program did it this way:

* Read in fortune #1, store in variable F.
* Read in fortune #2. With probabillity 1/2, assign it to F.
* Read in #3, and with probability 1/3, assign it to F.
* In general, for fortune #N, assign it to F with probability 1/N.

If you add up the fractions you'll see that each fortune has an equal chance of being selected.

This runs in time proportional to the number of fortunes, but constant space -- you never have more than two fortunes in memory. Actually, if you compute the probability first, you can null out F before reading in the new fortune, so actually you never need more than one. This was great for early UNIX back when memory was really scarce.

The more modern GNU fortune program uses the precompiled "index" files that Antonio originally postulated; this runs in both constant time and space, clearly preferable!
 
Ko Ko Naing
Ranch Hand
Posts: 3178
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ernest Friedman-Hill:


At a 100 characters or so each, two bytes per character, plus say a dozen bytes of overhead per object, that's about 23 megabytes. Funny how the definition of "not that much" has changed!


Nice calculation!!! Mr.Ernest is correct... How come should we put such big chunck of characters in a string object like this? I have met such kind of issue in the past before. For more info about the maximum byte that a string can contain, this thread made an interesting point for us here....
 
Darin Niard
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Interesting, but... if the first fortune has a 50% probability then why would it show up any less than 1/2 the time? Or, am I not understanding you correctly? As for the index files, even if it picks the 5000th entry, does it not have to run through the first 4999 in order to find it? This is something about file i/o that's never been clear to me...
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Darin Niard:
Interesting, but... if the first fortune has a 50% probability then why would it show up any less than 1/2 the time? Or, am I not understanding you correctly?


You always read the whole file, and you perform this procedure for every fortune. The first fortune thus has up to N-1 chances to be replaced -- if it's not replaced by the second one, it may be replaced by the third one, or fourth one, and so on. And once the first one is replaced, the replacement can itself be replaced, as can that replacement. So the probability of the first fortune being selected is 1/2 - 1/3 - 1/4 - 1/5 - 1/6 ... - 1/N, which is equal to, as it turns out, 1/N.


As for the index files, even if it picks the 5000th entry, does it not have to run through the first 4999 in order to find it? This is something about file i/o that's never been clear to me...


No, you don't have to. The index file would have the file offsets of beginning of each fortune. You could use java.io.RandomAccessFile, seek() to the beginning of the chosen fortune (a more or less constant-time operation) and then read the correct number of bytes (computed by subtracting two successive start indices.)
 
Darin Niard
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ernest Friedman-Hill:

You always read the whole file, and you perform this procedure for every fortune. The first fortune thus has up to N-1 chances to be replaced -- if it's not replaced by the second one, it may be replaced by the third one, or fourth one, and so on. And once the first one is replaced, the replacement can itself be replaced, as can that replacement. So the probability of the first fortune being selected is 1/2 - 1/3 - 1/4 - 1/5 - 1/6 ... - 1/N, which is equal to, as it turns out, 1/N.

Clever, but I think the way I've always picked a random line from a file can be slightly more efficient (n/2 average instead of n). Basically, I'd just count up to a random number between 0 and the fileLength and stop on that entry. That's what I thought of originally when he asked this question, but since it was running in a GUI, I figured he would be reusing the fortune list a lot, so speed seemed most important to him. The indexing way sounds like the best method though
[ June 24, 2004: Message edited by: Darin Niard ]
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Darin Niard:

Clever, but I think the way I've always picked a random line from a file can be slightly more efficient. Basically, I'd just count up to a random number between 0 and the fileLength and stop on that entry.


This only works if you know how many fortunes are in the file. The neat thing about the original fortune algorithm is that it doesn't require this knowledge -- it works in a single pass over never-before-seen data.
 
Darin Niard
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah, I know. Thanks for letting me in on that algorithm. Good stuff.
 
Antonio Rodriguez
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ernest Friedman-Hill:
[QB]

You always read the whole file, and you perform this procedure for every fortune. The first fortune thus has up to N-1 chances to be replaced -- if it's not replaced by the second one, it may be replaced by the third one, or fourth one, and so on. And once the first one is replaced, the replacement can itself be replaced, as can that replacement. So the probability of the first fortune being selected is 1/2 - 1/3 - 1/4 - 1/5 - 1/6 ... - 1/N, which is equal to, as it turns out, 1/N.QB]


I get it. The idea is that for the first fortune to survive, it would have to be outside of the probabilities that we calculate. Consider the case in which we have 5 fortunes.

1 always gets placed in F.
2 will be placed in F 1/2 of the time. Therefore, 1 survives 1/2 of the time.
3 will be placed in F 1/3 of the time. Therefore, if 1 survived the last replacement, it would survive this replacement 2/3 of the time.
4 will be placed in F 1/4 of the time. Therefore, if 1 survived the previous replacements, it would survive this replacement 3/4 of the time.
5 will be placed in F 1/5 of the time, Therefore, if 1 survived the previous replacements, it would survive 4/5 of the time.

So the probability of 1 surviving is 1/2 * 2/3 * 3/4 * 4/5 = 1/5

Very nice programming. Kudos to the original fortune programmers, and thanks to those who responded so quickly to my question. I apologize for the late reply, but I have really sporadic Internet access.

I probably will go towards the index methodology, to get the constant space/time factor, but this was a great insight. Thank you, Mr. F-H!
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic