running a large number of "counting QL" on a noSQL data store. Algorithm or a Best practice needed
posted 5 years ago
before we start let me give you some information about our environment:
* it is written fully in Java/J2EE.
* it is developed to be deployed on GAE "Google App Engine"
* its GUI is developed by GWT.
* our problem is in a core development issue.
Here is my problem,
* i am building a web application where users "online" can search for listings in this website.
* first please open the web site careerbuilder.com and search for any keyword e.g. "Accounting".
* a page will be opened , [Narrow Search] has a way to allow you go to your target job easier "lets call this a filter" ,lots of jobs down there.
* search filter includes sub-filters [Category , Company , City , State ].
* each sub-filter has many cases or options. like for "State has (California ,Iowa , Kansas , ...etc)" beside each one of them is the number of jobs that matches your current filter/sub-filter selection. you will find it between brackets i.e. (23)
Now we want to allow this filter functionality and we want to make it fast.
making a count query for each sub-filter option is going to be an effective idea.
kindly keep in mind that:
* users can add/remove listing.
* also listings can expire.
* number of sub-filters are higher for us "can reach 20".
* each sub-filter has between 2 and 200 options.
we are searching for the best practice or a suggestion of an algorithm or whatever to solve this problem.
here are 2 options we have reached so far:
1) building a statistics table to save these results in it, then update it each time listings number is changed , also keep a nightly background job to recalculate results. and we can show number of results directly from this table.
2) build a tree data structure to be loaded on memory and saved in a table each time it is updated. this tree contains the resulting numbers of listings in each option of sub-filters.
even though i still think this is not enough !!!
can anyone suggest a better idea?
all comments, questions, suggestions are very welcomed.