Word of warning. I've been doing performance studies and tuning for a (get-offa-my-lawn!) LONG time. I spent a fair amount of that on IBM mainframes, which were designed when resources were expensive (I remember how excited we got when they first had an entire MEGAbyte of RAM to play with).
Even then, the measured amount of CPU time for a typical app what 50% kernel usage, 50% application usage. So just to get a given amount of application optimization, you had to work twice as hard.
I certainly believe in efficient resource usage, and frequently get yelled at for worrying about such things in an age when you can just throw more hardware at the problem. Certainly, one should shut down useless processes and remove unused code. They're not merely a waste of space, they're also more places for a security exploit to attack. However, they're rarely going to have that great an impact on performance. Multi-tasking systems only function at all because there's a lot of "dead" time on most applications.
However, the first rule of optimization is not to optimize too early. It's far better to prototype the system and benchmark it with realistic conditions, then measure for where the real bottlenecks are. Hint: based on long experience, they're almost never where you "know" they are.
There are 2 popular approaches to ensuring that a given app isn't penalized. One is to scale processors. Blades, Beowulf clusters, stuff like that. The other one is the more traditional approach, which is to use a load-balancer. IBM's mainframe OS permits the systems support people to establish service level agreements and define service level objectives. The OS will then internally manipulate itself to ensure that these agreements are adhered to, pushing down the priorities of less-important processes, if needed. Solaris also supports this sort of thing. Although in the real world, outside of mainframes, what I've seen is a preference for tossing hardware at the problem, since they didn't want to pay the salaries for Solaris sysadmins who were trained to do load-balancing.
Actually, you really shouldn't
want to wring EVERY last ounce of performance out of a system. I used to work for someone who harped on that
string, until an IBM representative pointed out that if you're running a 100% load all the time, the first bump in the road will train-wreck you (to mix metaphors). No good general takes the field without reserves, What you want is
effective use of the system. Unless you really like having things go down just when you need them most.
Oh yeah, we have a very good optimization forum here.