I don't think HTTP headers are going to be useful to you. You could try parsing META tags from the HTML source. They're used by convention to indicate key words to search engines. Of course, that means they may be just a list of keywords and not necessarily meaningful. Your best bet may be to parse out the title tag, since it's human readable. Of course, both of these solutions rely on the HTML programmer to obey convention, and that is not really practical. For example, google.com has no meaningful META tags and the page title is simply "Google". [ March 04, 2008: Message edited by: Joe Ess ]
Is there a parser to read the contents of the meta and title tags ?
You could use something like HtmlTidy to clean up the HTML and hand it to you as XML (which makes it much easier to extract the parts you're interested in). A library like jWebUnit makes this even simpler.
But ultimately it's probably going to be fruitless, as the information you're looking is just not there, or rarely there.
He was giving me directions and I was powerless to resist. I cannot resist this tiny ad: