Win a copy of Modern JavaScript for the Impatient this week in the Server-Side JavaScript and NodeJS forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

Help with Web Crawler

 
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

can anybody tell me what this crawlers does, please?

I want to write a crawler which goes through the web and get info from them and also detect if each web site has a specific script

I have copied this code from the web



I have created a java project in my local and I get this results



I have run it from Eclipse, is that data from any specific web? From where that data has beedn retrieved?

Any suggestion?

Regards,
Isaac


 
Bartender
Posts: 9615
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried looking at the documentation? There's a step-by-step guide for getting started, FAQ and more.
 
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see no indication that anything has been retrieved. That looks like configuration info.

Why are you silently swallowing the exception instead of handling it properly? That's never a good idea. At least print the message to where you will see it.
 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Im going to explain it in more details the issue is as follows:

I have followed the tutorial step by step:



I run it and I get config info and then a warning which stops the execution:



I have follow thw steps of the tutorial for fix it but anything...tgen I have downloaded the log4j.propoerties folder and from the from the project classpath using Eclipse I have added the forder using the user Entries option

I still get that WARN message and it stucks there

Any idea?

Regards

 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks like the same code and output you posted before. If it's somehow different, tell us how so, otherwise it doesn't help. The warning is just about the logging setup (or lack thereof), it is irrelevant.

I'd guess the execution stops because the code is done. I don't see anything that would instruct it to do any actual crawling. As Joe pointed out, the documentation has step by step instructions; your code lacks at least one of steps (the one that starts actual crawling).
 
Joe Ess
Bartender
Posts: 9615
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Isaac Ferguson wrote:
I have followed the tutorial step by step:



It appears you have missed the line that says:

Every webcrawler has two main pieces: the “crawler” and the “crawl controller”.



Do you have a "crawler" class? I recommend you read the entire tutorial page as well as the source code it references as they explain the basics of how to use this API .
 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok Im there ....

After debbuging it launch an exeption at line :



The exeption is like this:



Only shows this:



This three parameters contein values (crawlConfig, pageFetcher, robotstxtServer)

Yes I am also creating the Crawler for call it later

Regards
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Isaac Ferguson wrote:The exeption is like this:Only shows this:


Use e.printStackTrace() to see the contents of the stack trace.
 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I m using



And the result is



Regards
 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Joanne Neal wrote:Use e.printStackTrace() to see the contents of the stack trace.

 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Isaac Ferguson wrote:I m using


I know. And that will just call the toString() method on the object returned by e.getStackTrace(). That object is an array of StackTraceElement instances and as arrays in java do not override the toString method it will use the toString method of the Object class.
As I said, use e.printStackTrace() instead.
 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks the trace looks like this now



I have gone to the folder "crawler4jStorage" and I have given to it total permission but still I get it

Any idea?
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried closing Eclipse and restarting it ?
 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I havew tried but it doesnt works
 
Joe Ess
Bartender
Posts: 9615
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried deleting the contents of the frontier folder, as the link posted by Joanne says to do?
Does the frontier folder exist before you execute the program?
 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I have deleted the contents and also the folder. When I run the program it is created again, it doesn´t exists before I run the program

 
Angus Ferguson
Ranch Hand
Posts: 1402
3
Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok the error is solved..... yes I killed manually all the instaces of Tomcat

And now it works, thanks
 
Won't you please? Please won't you be my neighbor? - Fred Rogers. Tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic