We've been running some load tests on our machines here, and we've noted a very baffling behavior. If we load up the machines so that the server is running near CPU capacity (80+ % CPU utilization for extended durations),
Tomcat will not expire old sessions. We can run the load generators on the machine until it starts thrashing on Full Garbage Collection activity (i.e., a Full GC takes 20+ seconds, and it is doing a Full GC every 22-25 seconds, so we only get 2-5 seconds of 'run time' between GCs).
The thing we've noted is, our GC activity is caused by sessions filling up the memory space. If we pause the load generators for 2-3 minutes, the sessions will go away as expected and then we can restart the load gens and everything runs smoothly until we build up some more sessions.
Our environment is clustered, so the problem is even more significant - since the sessions are essentially copied from every server in the cluster (QA has 2 members in the cluster, production has 4 - so the problem is very significant when we get to prod!)
Even going into the "/manager/html" webapp and manually clicking the "Expire" button (for sessions with >= 1 min idle time) doesn't seem to affect the session expiration when the machine is under load. But as soon as we remove the load, the machine begins to recover and expire the sessions.
FYI, for
testing purposes, I've set our session timeouts in the 'web.xml' file to 1 minute, but we actually control the session duration in code and have it set to 180 seconds in the code configuration. (When I examine the sessions - they seem to have the correct expiration times - 3 mins - and they even show that they have been idle for "20+" mins.)
We're wondering if there is a setting in Tomcat to tell it to alter the priority of session expiration, so that it will take precedence over the page requests.
In case it matters, we're using Tapestry 5.5 for our UI stuff, and we store nothing in the sessions if you aren't authenticated, but about 10K worth of data if you are authenticated. Our load generation is running 200 threads to get a plain page - unauthenticated, and 40 threads to perform registration and login actions, which stores about 10K per iteration... We're running with -Xmx6g, and -Xms2g and a 1g PermSize (although we've never really gone more than about 600KB in Perm space used).
Our site serves several million pages a day, and a substantial number of registrations each day (more than 10K). Which is why we're trying to evaluate loading conditions - to make sure we have our production equipment sizing correct...