• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Statistical distribution of hits

 
Frank Carver
Sheriff
Posts: 6920
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm setting up some web sites soon, which may eventually get a lot of hits. What I need to know is at least some idea of how "clumped" or "spread" hits to popular web sites might be.
For example, Java Ranch gets over 40000 unique visitors per month. I don't know how many hits that translates to, but if we say (figures pulled out of the air) that on average each one visits four times a month and causes 100 hits per visit, that makes 16 million hits per month. If we simply divide that down, that gives (approx) 500000 hits/day or 42000 hits/hour or 700 hits/minute or about a dozen hits per second. But that doesn't really help me work out how powerful the server has to be. There are obviously some times of the day, and some times of the week when there are proportionally more or less hits than average, so I'm still in the dark.
Does anyone know where I can find this kind of statistical analysis of hit-rates to web sites?
 
Frank Carver
Sheriff
Posts: 6920
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The Java Ranch now makes use of "extreme Tracking", which goes quite a long way toward giving answers to this sort of question. See the little "globe" icon at the bottom of this page. It seems that (in the case of the Java Ranch, at least) the peak (roughly 9am, Tuesday, US-time) is about four times the size of the trough (roughly 7pm, Sunday, US-time).
So to make a robust system which will cope with most regular daily loads, you need to prepare for at least twice the measured average load. Obviously, to make sure you really give good performance on all (or almost all) hits, you will probably need more headroom than that, and none of this will help if you are "slashdotted".
But it's a good starting point.
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This sort of question is floating around my office these days.

We are preparing a 2nd 'release' or 'edition' of a website for our parent company. The new site must 'meet or exceed' the current site's "Capacity". And it's funny to find out what a loaded word "capacity" can turn out to be.

WebTrends gives us a nice "hits per day" report. But what about the peaks? We can't just divide the hits by 24 and think we're ok. And 'hits' is not the same as 'page views' or even 'page requests'.

We had fancy load testing software that gave us a "page views per second" (in any given second, how many pages 'finished' rendering) and we were comparing this with WebTrend's "page views" until we realized they were not really the same thing.

Finally it all came down to a Perl script that parsed the weblogs. So we ignored image requests, (because the number of images between the old and new site has changed!) and simply counted pages requested in each minute of the day. Then we convereted this into a nice Excel graph. Pictures are indeed worth a thousand words.
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can use JAMon for tracking page hits in any time increment. It is flexible because it knows nothing about pages. It only knows about the string that is passed. Taking advantage of this you can pass different strings that represent page hits by day, hour etc. It can also monitor within a page, such things as database connections, or queries.
Here is some sample code. Note that the strings could be totally dynamic:

steve - http://www.jamonapi.com
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic