What is the best algorithm to search a txt file and find the shortest path? I don't want it to depend too much on the system its running on, ive read some threads that people run out of memory. So any ideas on which tree, graph, hashtables etc to use? and is there anything I should take into consideration? Thanks hope someone can shed some light on this issue.
I understand the Six Degrees of Kevin Bacon reference, but I think we need more information about this text file and exactly what you hope to do with it.
"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer sscce.org
posted 11 years ago
I have a txt file which is 54mb and each line in the text file is a movie with the list of actors in that film. What I want to be able to do is have the user enter "Cruise, Tom" - and then find all the movies Tom Cruise has been in and have all the actors in that films be nodes, and each actor then has a list of movies they've stared in, and this keeps going until an actor has appeared alongside Kevin Bacon, the shortest number of connection or nodes is the shortest path.
sorry for the confusion [ May 30, 2007: Message edited by: John Smith ]
It seems like that even if the whole text file would fit into memory (especially what with virtual memory), that your app would start up a whole lot quicker if you imported all of the data into a JDBC database with two database tables: Movies and Actors. This is a many-to-many relationship so you would have a third database table to link the database tables Movies and Actors together. This third database table could be called ActorsInMovies.
You would only have to read the text file the first time and once your database is built, your app could start up much more quickly.
I believe that the algorithm would be recursive and a breadth-first search.
Well, I guess it depends on how much of a beginner you are. This could be a bit complex. Basically, I would take two HashMaps and build them up gradually with multiple passes. The first Map would represent the people connected to actor A, and the second would represent people connected to actor B. First pass, add all actors who shared a movie with A or B. Second pass, add all actors who shared a movie with actors already in the maps. (Assuming the new actor isn't already in the maps with a shorter connection.) Stop when you detect the same actor is linked in both maps - that's the shortest connection.
To go into a little more detail on the maps: the key will be a String representing an actor. The value will be a custom class representing the shortest link to the target actor:
Add getters, setters, and constructors as desired.
So if I'm trying to link actor A = Kevin Bacon and actor B = Halle Berry, the first pass I might put in
mapA.put("Tom Hulce", new Link("Kevin Bacon", "Animal House", 1));
indicating that Tom Hulce is just one link away from Kevin. Another link on the same pass might be
mapB.put("Patrick Stewart", new Link("Halle Berry", "X-Men", 1));
Of course there are many more links to put in both maps. I'm just showing a few particular examples.
The next pass, I might put in
mapA.put("F. Murray Abraham", new Link("Tom Hulce", "Amadeus", 2));
mapB.put("F. Murray Abraham", new Link("Patrick Stewart", "Star Trek: Insurrection", 2));
At this point I could notice that F. Murray Abraham is in both maps. So I've found a connection from Kevin Bacon to Halle Berry in four steps: Kevin -> Tom -> F. Murray -> Patrick Stewart -> Halle. Of course in reality there's probably a shorter connection which I would have found first, but let's pretend there isn't. I'm just making up results as I go.
There are quite a few details I've skipped over as to how to do this, but that's the rough outline. Can you make use of that?