This week's book giveaway is in the Kotlin forum.We're giving away four copies of Kotlin in Action and have Dmitry Jemerov & Svetlana Isakova on-line!See this thread for details.
Win a copy of Kotlin in Action this week in the Kotlin forum!
programming forums Java Java JSRs Mobile Certification Databases Caching Books Engineering Languages Frameworks Products This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Sheriffs:
Saloon Keepers:
Bartenders:

# Search a document

Sid Scud
Ranch Hand
Posts: 32
Hi All,
the problem i am facing is that, i have to search a word/text document for certain keywords. For every keyword, I am launching a new thread to search in the docuemnt. But the time consumed for a word docuemnt of size 103 KB is around 5 or 6 sec.
-sid

Peter den Haan
author
Ranch Hand
Posts: 3252
The easiest thing to do is probably to check for inefficiencies in your implementation. For a start, since you're unlikely to be I/O bound, spawning threads will not help you. In fact, if the number of keywords to search for is large, it will slow you down. Also, examine carefully the number of intermediate objects (esp Strings) you create - your execution time could easily be dominated by the garbage collector.
Failing that, try this. For large amounts of text or large number of keywords, your algorithm is flawed. You are doing a simple linear search for each keyword. If you have N words in your text and M keywords, the time taken will be O(N*M). You can do better than that.
The idea is this. Put the keywords in a hash table. Creating the table takes O(M) time, while searching for a keyword in the hash table takes approximately a constant (O(1)) time. Then go through your text, chop it up in words, and search each word in the keyword hashtable. The total time spent is O(M) + O(N)*O(1) = O(M) + O(N), giving you far better scalability.
Beware though - it is probably best not to use a standard Hashtable or HashMap, as they would force you to create an intermediate String for every word in your text. With your own hash table, you don't have to physically extract each word from the document, but can simply determine its hash code instead. (This will probably read like gibberish unless you're familiar with hashing... sorry).
- Peter

[This message has been edited by Peter den Haan (edited May 20, 2001).]

 Consider Paul's rocket mass heater.