A couple concrete things to think about: For horizontal scalability on a vendor framework - just add more servers - they had to solve some memory synchronization problems. They have a distributed cache that uses JMS to replicate updates.
They use "affinity" to always send a user to the same server. If that server goes down, some number of users are effectively no longer logged in and they have to start the app over. Is that acceptable? If not, you'd want to add session replication or push all the interesting stuff from session to a database or something.
Our old system - still running - had scalability challenges. We were pegged at 90% CPU with half the users we had to support. So we tuned, optimized, cut overhead, etc and ran at 90% with 75% of the users. Then we did it all again and ran at 90% with all of the users. Then we did it again. Whew. What worked? Eliminating extraneous calls from fat client to server through caching and bundling of results. Eliminating cross-process calls on the server (not
Java). Providing custom APIs for specific business functions instead of reusing generic ones. Eliminating some user requirements! All of the problems were found through extensive stress
testing, instrumentation, analysis. And one or two customer calls. The point of all that is to be prepared to make this a life's work

[ February 22, 2004: Message edited by: Stan James ]