• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Search Algorithm again

 
Gennady Shapiro
Ranch Hand
Posts: 196
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've scanned this entire board and have found no suggestion on any search algorithm other than Linear Search. (except indexing or caching that are both well beyond this assignment)
Has anyone implemented or may suggest search algorithm other than Linear Search???
Sun gives 23 points for it, this makes me think there might be something else besides linear we can do to improve search.
Thanks
P.S. The only thing I can think of is that if fields in DataInfo had been pre-sorted your search time would have been O(N) * O(log(M)), where N=# of records, M=# of fields. But since they are not sorted sorting them takes O(Mlog(M)) time so total time becomes O(N*(Mlog(M) + Clog(M))) (C=#of criteria) which in O notation is equal to O(N*Mlog(M)) which is still worse than O(M*N) that linear search gives you. Am I right?
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The points are, I think, mainly awarded for the way you executed the parsing of the search string in criteriaFind and the appropriate delegation of the actual search to other methods. You are certainly not supposed to do anything other than linear searching (or let me put it differently: you won't lose points because your search is dumb).
Your estimates don't look correct. In criteriaFind you're giving the field names along with the values they're supposed to have, so M doesn't enter the equation at all (at most, the number of criteria fields does). In a real database, the optimiser would try to do the most discriminating search (field) first that had an index. The time taken would be O(log(N)) for a Bayer tree index, O(1) for a hash index. Provided the result set from this first field is small enough the time taken for the remainder of the search would be small.
- Peter
 
Gennady Shapiro
Ranch Hand
Posts: 196
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Peter,
are you saying that Data (or subclass) should create the index of fields when it starts up?
Only then M does not enter the O-equation. That makes sense.
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What I was trying to say is that, for the assignment, please don't bother. If anything Sun might mark your scores down because of the additional complexity introduced.
The discussion of performance estimates was a sideline only, and not relevant to the SCJD assignment. M doesn't enter the equation because you don't have to search through the fields at all. It would enter the equation if you could do a search like "any database field that reads 'foobar'". But criteriaFind() doesn't allow this.
- Peter
 
Gennady Shapiro
Ranch Hand
Posts: 196
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, I hate to beat a dead horse, but I just dont get it.
You are saying you dont hae to go through all the fields to find the one that you need??? How are you going to find it then?
You can use Map to index fields, but in this case, you are just making the Map do the work, but you are still going through all fields to find the one you need.
If you had said that M doesnt enter equation because M doest grow I would have agreed, but I just dont understand what you mean.
[This message has been edited by Gennady Shapiro (edited November 16, 2001).]
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When you parse the criteria, you can map the field names to field numbers (O(M)) (or if you wanted to tune the hell out of it, the offset inside the record). Then you can access the fields directly during the actual search (O(N)). Of course the O(M) time taken in the first part is utterly insignificant compared to the second.
- Peter
 
Gennady Shapiro
Ranch Hand
Posts: 196
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, now we are on the same page.
Your index Map still makes M compare operations while locating your index. So even though your Map makes M comparing of Hash indexes instead of M String parsing you dont get any performance improvement in terms of O(). But you do get pretty good improvment in terms of String parsing overhead.
I agree that M does not grow with time and therefore can be assumed a constant.
I also agree with you that delegation of search steps is being looked at by Sun people.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic