Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Efficient Caching  RSS feed

 
John Smith
Ranch Hand
Posts: 2937
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For caching purposes, we are storing the results of the database queries in an ArrayList, and for subsequent DB requests, we look for the results in the cache, iterating through all the elements of the list. The reason the cache is implemented that way is because one of the parameters of the query involes a range, rather than a fixed value. This is how the list elements look like:

The "minAge" and "maxAge" parameters define a range. So, assuming the above data are in cache, if we have a query "what is the value for a 45 year old male in CT", the result would be "object4". Notice that in order to retreive the value from the cache, we need to iterate through all the list elements and do a a bunch of comparisons. The cache may potentially contains tens of thousands of elements, and the search becomes expensive.
Here is my question. Ideally, I'd like to have the cache structured in such a way that a single lookup call would do the job. That is, given a query, I would like to be able to determine if the results are already in the cache in a constant time, rather than O(N) time) using something like a HashMap. Can anyone think of a data structure/container that would be efficient in this situation?
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you always looking up a specific state, gender, and age? I might define a class combining state + gender into a single key (with equals() and hashCode() methods). Then have a HashMap with state + gender as key, and a TreeMap as value. The TreeMap would use the minimum age as key, and the remaining data (including max age) as the value. Use the HashMap to grab all data for a given state and gender, then use the TreeMap to find the entry with the highest minimum age that does not exceed your target. Then check the max age from the value, and see if the entry you found is appropriate. If not, that should mean that the data you need is not in the cache, and it's time to go to the DB or whatever. I'm assuming that the age ranges in data for a given state + gender don't overlap. I.e. you won't have an entry for MA male age 20-30, and another entry for MA male age 25-28. That would make it much harder to use the TreeMap as I describe.
My gut feeling though is that there shouldn't be that many age ranges for a given state + gender, and simply iterating wouldn't be that much slower. So perhaps I'm missing something. Are there going to be "tens of thousands of elements" with the same state + gender? Or more like 1-6 elements? (How many different age ranges woud you need after all?) Or is there more data which will differentiate the different entires? If so, what sort of additional data?
 
John Smith
Ranch Hand
Posts: 2937
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My gut feeling though is that there shouldn't be that many age ranges for a given state + gender, and simply iterating wouldn't be that much slower. So perhaps I'm missing something. Are there going to be "tens of thousands of elements" with the same state + gender? Or more like 1-6 elements? (How many different age ranges woud you need after all?) Or is there more data which will differentiate the different entires? If so, what sort of additional data?
I didn't want to confuse anyone with the business specific terminology, but yes, there are some other pieces of data that make the database row. In addition to min age, max age, state, and gender, there are also product type, business type, and usage type. All the different permutations of these factors result into up to a hundred thousand rows, and potentially more. But the min/max age is the only unique pair of factors in the sense that the range of, say, min 20 to max 30 satisfies the search for age 25, if all the other factors match exactly.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmmm... so we could make a single Key class which includes state, gender, product type, business type, and usage type. (But not age since that's special.) Do you know all these things when you perform a lookup? If so, great, a HashMap with Key class as key will work. (I doubt you really need TreeMap values - Lists will do since you have so few elements to iterate through.) But if, for example, you want to find all records with a given state + product type, but don't have the other values - then a HashMap with a composite Key based on [state + gender + product type + business type + usage type] will not work, since you wouldn't be able to use the missing values as part of the hashCode(). So - do you need flexibility for a variety of different types of searches, or not?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you really sure you need this caching? Modern databases typically are already quite good at caching results. You might also want to explore optimizing the database directly, by tweaking its parameters, introducing the correct indices etc. (if you haven't already done so).
Of course that doesn't help you much if network traffic is your bottle neck. In that case you could alternatively think about using a local in-memory database (such as hsql) as a cache. That way you wouldn't have to implement any complicated lookup logic.
Hope this helps...
 
John Smith
Ranch Hand
Posts: 2937
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim: Hmmm... so we could make a single Key class which includes state, gender, product type, business type, and usage type. (But not age since that's special.) Do you know all these things when you perform a lookup? If so, great, a HashMap with Key class as key will work. (I doubt you really need TreeMap values - Lists will do since you have so few elements to iterate through.)
Yes, all the factors are known when the query is made, so a hashmap with a composite key and the list of results as a value would work. That answers my question, thanks Jim, -- I'll go with your solution (seems natural in retrospect).
Ilja: Are you really sure you need this caching? Modern databases typically are already quite good at caching results. You might also want to explore optimizing the database directly, by tweaking its parameters, introducing the correct indices etc. (if you haven't already done so).
We use db2, all the indexes have already been optimized for the queries made, and the db stats ran (indexes are up to date). The queries are quite efficient (about 8ms or so, including the time to trasport the data forward and back over the network), but since we make tens of thousands of queries to that table, the database I/O becomes a hotspot. Caching the results of the queries speeds things quite a bit.
Ilja: In that case you could alternatively think about using a local in-memory database (such as hsql) as a cache. That way you wouldn't have to implement any complicated lookup logic.
I don't know what it is, -- some utility to map the db tables to memory? Does it work with the existing database, or is it a stand along thing?
[ November 09, 2003: Message edited by: Eugene Kononov ]
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ilja: In that case you could alternatively think about using a local in-memory database (such as hsql) as a cache. That way you wouldn't have to implement any complicated lookup logic.
I don't know what it is, -- some utility to map the db tables to memory? Does it work with the existing database, or is it a stand along thing?

It's a fully fledged SQL database which can be configured to run embedded, fully in memory - so that you don't get any disk IO overhead. Just a callow idea...
http://hsqldb.sourceforge.net/
[ November 09, 2003: Message edited by: Ilja Preuss ]
 
Kirk Pepperdine
Author
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jim Yingst:
[QB]Are you always looking up a specific state, gender, and age? I might define a class combining state + gender into a single key (with equals() and hashCode() methods.
One thing that you maybe able to do is design a collection class that contains mulitple indexing schemes. In this case since you are looking for a range, I might suggest that you use a N-M tree for one of the indexes. Use that structure for age.
HTH
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!