As an intro, I am working on a project for a 2nd year data structures class, and we are not permitted to use any libraries other than the
Java API.
For my project-- this part of it anyway-- I am creating a
word frequency tree of, basically, my school's whole domain, in order to create a search engine for it. I created a class to spider through and look for hrefs in html and generate a list of all reachable sites from a seed site (the home page) and then create a binary search tree with objects composed of a word from the site and how frequently it appears. Then, I have a separate class that contains the
String of the URL and the word frequency tree that goes with it-- URLContent.
Anyway, we're required to use a minheap of URLContent objects (generated after the search of a keyword/words) in order to return the most relevant sites. However, I can not, for the life of me, think of a
good solution for the URLContents' key. Essentially, the more relevant the search is, the lower the key should be.
My brute force idea is to bake a class level integer variable into the URLContent class-- and then subtract how often each of the search words appear from the initialized number (say 100). However, this does not lend itself well to caching(the next part of my project).
1st question: Can anyone think of a good reason to use MinHeapPriorityQueue over a MaxHeapPriorityQueue here?
2nd question: Any supplemental ideas with key generation?
Thanks!