Hello this might sound a strange question but I have been asked to impact the removal of the java client from workstations perhaps with a view to replace the Oracle client with a free version. The problem is neither us or the customer really knows what sites users are accessing which use java. We can tell what sites they go to by tracking the URL's through the proxy but telling which ones have java or not is impossible from the logs.
There are two ways for a website to "use Java", and actually what you are asking is the one most people would be less likely to assume.
The way most of use would think would be that the site was written in Java, and that's not what your question implies. As for that, however, sometimes there are tell-tale indicators (URLs end with something like ".jsf" or ".do"), and sometimes there aren't. And it would be hard to avoid. My understanding is that Amazon.com (for example) was originally written in Java and ran on WebLogic server, but I don't know what it's written in these days (and to be fair, any large site is likely to have multiple apps written in different languages).
What you actually seem to be asking is how to detect what sites are downloading Java code that runs on the client's computer and that's easier. There are basically 2 ways to do that: applets and jnlp. Applets run in the browser and JNLP is a mechanism that downloads actual Java Applications, caches them and runs them on the client.
Applets are the ones that most people worry about, since they are where the most insidious security risk lies. Fortunately, thanks to the way that the Internet evolved (and, ironically, to some pettiness on Microsoft's part), applets for open Internet applications are very rare. You can detect them by examining web pages for the tell-tale applet HTML tags.
JNLP is technically more dangerous. Applets were supposed to be safe because they ran in a sandbox that limited their access to other webapp servers, the user's local filesystem and devices (including printers), and so forth. It turns out that that sandbox was very leaky, however. JNLP apps, on the other hand, run in the standard JRE sandbox, which by default allows almost anything.
However, it is far more obvious to the user when JNLP is in operation, so that even the dumbest drooling click-without-thinking user is probably going to have to stop and scratch before proceeding.
I should note, by the way, that there are a number of very useful and respectable applications that can be launched via JNLP. Apps like the ArgoUML UML editing program, the ProjectX video editor, and I think jGantt Gantt charter applications. Although you can usually download these manually as well (as executable JARs).
Actually, one of the best ways to keep Java from running amok on your client machines is simply to scour them for installed JVMs and delete them all. Unless you know of some really compelling reason, it's probably going to be easier to deal with the screams from outraged users when sites that they probably shouldn't be visiting anyway quit working than to do on-the-fly inspection of HTML passing through your network. OK, if you have Java developers, that's not going to work for them, but THEY are supposed to know how to practice Safe Java.
I've always been a proponent of the 90/7/3 approach to standards: 90% of the users don't need certain features, 7% have legitimate needs for them, and 3% are going to be completely weird and need special treatment. It's a far more reasonable approach than a Procrustean one-size-fits-all policy (which generally gets undermined anyway).
When it comes to destroying a civilization, gas chambers cannot hold a candle to echo chambers.
I want my playground back. Here, I'll give you this tiny ad for it:
Create Edit Print & Convert PDF Using Free API with Java