I was hoping to get some advice regarding implementing a cache to go with the proxy server I am currently working on.
I'm a student and am currently working on it as part of my final year project. The proxy itself is able to extract target host, url etc from the http request and can from there forward details to the relevant server and forward the response back ot the client. It is running at the moment with one class listening for requests on a socket in a loop, and when a request is heard it's accepted and then handled by a seperate thread - which is where the details are extracted from the headers and where I'm hoping to include the calls for caching methods from a seperate class - Cache.
Any advice to start me off would be great. From looking at some other examples online it seems the use of hashtables are a common way to store and retrieve items in the cache using the URL as the name in the table, but I have no experience of using these so some pointers here would be great as well! If anyone wants to see what I have so far, then I can post the code up, but it is very much a work in progress!
Well, that sounds like an interesting project. I'm not sure I understand though. You're in your final year ... of a computer science degree I assume, but you don't know how to use a hash table? And yet, you seem to have created your own HTTP server using just raw sockets ... not a trivial task. I'm suffering from cognitive dissonance and need to go take an aspirin.
But first ... it can be useful to think of hash tables as arrays that are indexed by Objects (usually Strings) instead of sequential integers. In fact some languages call them associative arrays, and have built in support for them. In Java, you want to check out the java.util.HashMap class. There are several other implementations of hash tables, but that's the main one.
I'm going to move this thread over to our General Java forum since it deals with a number of different parts of Java.
Yeah, my degree course hasn't had much reason to go into hash tables unfortunately, a fact I am now quite resentful of!
I looked into the suggested class and have read up on a couple of examples using hash tables, and there doesn't seem anything too alien about hash tables themselves or the way they work, so thanks for the tip.
One thing I'd appreciate some advice on: what might be the best way to construct the cache process. Currently I have one class which acts as the proxy server and then passes request information to a seperate thread class to be forwarded. My intention is to make a Cache class which will have methods providing functionality to check the hash table for matching entries and depending on the result retrieve that file or forward the request to the server. In basic terms this seems to make sense, to me anyway, any other theories or options which may provide other benefits I've overlooked could be nice.
But once getting further into it, I've seen an example which makes use of a Config class which holds details about remaining cache space and number of files saved... is it really likely to be particularly important to specify the cache size at this stage? and to provide functionality to clear cache or limit the number of entries? I do plan on incorporating some of these things later on, and more, but it should be fine to leave them out for now ... right?
Really appreciate anyone taking the time to offer their expertise, thanks a lot!
I've never actually written my own cache, but generally, yes, any cache should have a size limit. If you had enough core memory to store everything, you'd just do that and not even bother with a cache. A corollary of the size limit is the need for an algorithm for kicking things out of the cache when you need space for a new request. This algorithm is almost always "least recently used" (LRU) or some derivation of it. You'll need to timestamp objects as they are accessed, and be able to efficiently find the LRU objects to kick from the cache. The data structure you want to research for this purpose is a priority queue, also known as a heap.
Many caches also provide the ability for objects to go stale after a timeout, or be programatically marked as stale. That's a whole other level of complexity though. Do you get to set your own requirements for this project, or are they part of the assignment?
You get good luck from rubbing the belly of a tiny ad: