This week's book giveaways are in the Jython/Python and Object-Oriented programming forums.
We're giving away four copies each of Machine Learning for Business: Using Amazon SageMaker and Jupyter and Object Design Style Guide and have the authors on-line!
See this thread and this one for details.
Win a copy of Machine Learning for Business: Using Amazon SageMaker and JupyterE this week in the Jython/Python forum
or Object Design Style Guide in the Object-Oriented programming forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Knute Snortum
Sheriffs:
  • Liutauras Vilda
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Joe Ess
  • salvin francis
  • fred rosenberger

Web Crawler Exercise

 
Ranch Hand
Posts: 224
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just completed the Web Crawler exercise (at "127.0.0.1:3999/concurrency/10") and therefore the whole Go Tour. But I'm just kind of wondering. The exercise was to create a web crawler that explored every URL on a page, and for each such URL every URL on the page that URL referred to, and so on and on forever recursively. (Well, not quite forever; there was a depth limit built into it.) My code that accomplished it was:

But note that in order to get it to work I had to put in a call to "time.Sleep( time.Second)" in my main function. Without that line in, the main function would end up returning, and terminating the program, before very many calls to "Crawl()" had gotten executed. Is there some way in Go to tell the main function to wait and stay alive until all currently executing lightweight threads have completed executing?

I was thinking one way I could implement that would be to add an integer "Count" field to my "SafeMap" struct, increment it before each call to "go Crawl(sm, u, depth-1, fetcher)", and only decrement it at the end of the "Crawl()" function, and then have my main function loop on that "Count" variable until it was zero again. That seems kind of drastic though. Anybody have any better ideas?
 
Sheriff
Posts: 14759
245
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kevin, the code tag doesn't currently recognize "go" as a language that it can prettify so don't set that attribute for now. I'll see what I can do to add "go" as a language that the code tags recognize.
 
Junilu Lacar
Sheriff
Posts: 14759
245
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think you should be changing the function signature(s) to include your SafeMap as a parameter. Since goroutines run in the same address space, the SafeMap would be shared by functions in your program. It's the methods in your SafeMap that would use mutex.Lock() and mutex.Unlock() to serialize access to the encapsulated Map. Your implementation "reaches into" the object and manipulates the mutex. That breaks encapsulation.

That is, your implementation should have something like this:
 
Junilu Lacar
Sheriff
Posts: 14759
245
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also, the map they refer to is supposed to act as a cache, so you can avoid going to the Fetcher more than 1 time for each URL. A cache usually holds the same kind of thing you get from the original source. Your SafeMap holds a map[string]int which is not the same thing you get from the original source. I would look to the Fetch function parameters to see what kind of map the SafeMap should hold. As an object, I think the SafeMap should have Fetch() and Put() methods.
 
Junilu Lacar
Sheriff
Posts: 14759
245
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You might also want to look at the example under the "Parallel digestion" section on this page: https://blog.golang.org/pipelines
 
Junilu Lacar
Sheriff
Posts: 14759
245
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Instead of sleeping, you can wait on a channel that gets closed when no more new URLs are found, i.e., all URLs are retrieved from the cache.
 
Rancher
Posts: 4686
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry to jump in with a negative comment, but the code posted looks like Java code that is accidentally written in GO. Its a long way from idomatic GO

For example, GO code doesn't use sleep, it uses channel. And the mutex usage looks straight out of Henry Wong's Java Threads book.

Don't feel bad, most folks write in their old language when learning a new one. But to see GO's strengths, you have to write idiomatic GO.
 
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!