There are loads of topics around the net describing how to check if an URL exists in Java. They seems to work fine when the URLs are rather simple, but fail on more messy URLs.
If I example pass "http://google.com", the method will return true. That's good. If I pass "http://google.com/i_dont_exsits", the method returns false(404). Also good. Now if I pass a "messy" URL, like this:
The method returns false(503). That's not right. If I enter that URL in a browser, I see a that the URL is perfectly valid and working.
Why do my method return false when it should be true?
Interestingly - Amazon will successfully reply to HEAD requests when setting the User-Agent to one for this old browser
posted 3 years ago
Ron McLeod wrote:I think your best chance of not having the server reject your request would be to:
specify a User-Agent for a real browser (for me, running your code as-is, the User-Agent was set as Java/1.8.0_45)
use the GET method rather than HEAD
That seems to solve the problems(or doing a GET with Jersey Client, not setting user agent). However, doing GET will fetch the entire site, I think. I assume it would be a problem in an environment where performance is critical.
Jeanne Boyarsky wrote:My guess is that Amazon is trying to avoid people scraping the site and looking for a user agent header or the like.
This Selenium code correctly returns response code 200 for your URL.