• Post Reply Bookmark Topic Watch Topic
  • New Topic

searching keywords in webpage  RSS feed

 
Arjun Shastry
Ranch Hand
Posts: 1906
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What is best way of searching some keywrods in web page. In Database, i have list of webpages. In each webpage, i want to check if some keyword are present. I m storing webpage data as String.Is String.indexOf() methods good one ? Size of page may differ.
 
Balaji Vankadaru
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maintain a collection of words you want to search . When a collection is implemented we can verify if a particular word is present or not buy using contains method. which would return tree if the word exists in a collection or false which states the world does not exist.

Sizeof would return the size of the string which does not serve your purpose of searching.
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not convinced Sizeof exists in Java; it looks like a C keyword. Do you mean indexOf? you should use the String#contains method rather than indexOf if you only want to check for existence of a substring.

For a linear search, you would have to iterate the text of the webpage once for every keyword. Also what will happen for the keyword short if the text includes shorten? I think I shall work from Balaji Vankadaru's suggestion.
Put your keywords into a set. Split the text into a String[], maybe splitting on whitespace. Iterate the split array and see whether the set contains each word.
 
Arjun Shastry
Ranch Hand
Posts: 1906
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!