This code looks to be a
testing webcrawler, for a particular location, and for particular sites. It doesn't look like it was meant to be a general purpose crawler.
First of all, this code...
Sets an http proxy -- which implies that the code is supposed to run from a location that is behind a particular firewall.
Second of all, this code...
Checks and confirms a robot.txt file to double check if it is allowed to crawl the site and how to crawl the site (I guess this was a growing standard years ago). Regardless, many sites, like yahoo, doesn't have this file, so the crawling will abort. You should check to see if this may be the cause in your case.
My guess is, this code was probably used for training, and to be run only from the training site, and only to target specific testing sites.
Henry