• Post Reply Bookmark Topic Watch Topic
  • New Topic

Checking if a URL exists

 
R Sriram
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,
I have a set of URLs and I need to check if each one exists, before I go ahead to do some further processing. So, I have been trying to use "URLConnection". For any HTTP urls it becomes easy to identify with the response status codes but where as for an URL responding to FTP, I am unable to exactly identify the URL's availability. I see that there is a sun.net.www.protocol.ftp package available, but I do not see any method retrieving me status. I tried checking the header map returned by the URLConnection but it does not seem to return anything for FTP sites. Any help in this regard would be of immense use, Thanks in advance
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
URLConnection works with FTP, too (at least on some JREs it does). It doesn't support the full FTP functionality, but for checking the availability it should be sufficient. See here for some discussion.
 
R Sriram
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Ulf,
Thank you for your response. I did look at the code before. I was just wondering if it was a comprehensive enough procedure to confirm the ftp sites' existence.???
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That should be easy to test, shouldn't it?
 
Paul Clapham
Sheriff
Posts: 21892
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You should be aware that if you are trying to test if a URL "exists" by connecting to it, then you can get false negatives. If it happens that the URL's host server is down when you try to connect to it, you will think it doesn't "exist". But it actually does, you just can't reach it at the moment.

So don't make any permanent decisions on a single test like that.
 
R Sriram
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well Paul - you made a point. And, there are even legit sites which respond only to IPs of some specific geographic location and things like that. So, was wondering what would comprehensively conclude the fact that URL exists.Right now, I have just stuck to the idea of checking the status code for HTTP(S) and reading the stream for FTP. All those URLs which I am unable to conclude would be retried at some point of time. Just wondering, what Google employs for its crawling??
And Ulf, yeah I was able to test that as well.There were some false negatives.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!