• Post Reply Bookmark Topic Watch Topic
  • New Topic

Help with Web Crawler  RSS feed

 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

can anybody tell me what this crawlers does, please?

I want to write a crawler which goes through the web and get info from them and also detect if each web site has a specific script

I have copied this code from the web



I have created a java project in my local and I get this results



I have run it from Eclipse, is that data from any specific web? From where that data has beedn retrieved?

Any suggestion?

Regards,
Isaac


 
Joe Ess
Bartender
Posts: 9441
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried looking at the documentation? There's a step-by-step guide for getting started, FAQ and more.
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see no indication that anything has been retrieved. That looks like configuration info.

Why are you silently swallowing the exception instead of handling it properly? That's never a good idea. At least print the message to where you will see it.
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Im going to explain it in more details the issue is as follows:

I have followed the tutorial step by step:



I run it and I get config info and then a warning which stops the execution:



I have follow thw steps of the tutorial for fix it but anything...tgen I have downloaded the log4j.propoerties folder and from the from the project classpath using Eclipse I have added the forder using the user Entries option

I still get that WARN message and it stucks there

Any idea?

Regards

 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks like the same code and output you posted before. If it's somehow different, tell us how so, otherwise it doesn't help. The warning is just about the logging setup (or lack thereof), it is irrelevant.

I'd guess the execution stops because the code is done. I don't see anything that would instruct it to do any actual crawling. As Joe pointed out, the documentation has step by step instructions; your code lacks at least one of steps (the one that starts actual crawling).
 
Joe Ess
Bartender
Posts: 9441
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Isaac Ferguson wrote:
I have followed the tutorial step by step:


It appears you have missed the line that says:
Every webcrawler has two main pieces: the “crawler” and the “crawl controller”.


Do you have a "crawler" class? I recommend you read the entire tutorial page as well as the source code it references as they explain the basics of how to use this API .
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok Im there ....

After debbuging it launch an exeption at line :



The exeption is like this:



Only shows this:



This three parameters contein values (crawlConfig, pageFetcher, robotstxtServer)

Yes I am also creating the Crawler for call it later

Regards
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Isaac Ferguson wrote:The exeption is like this:Only shows this:

Use e.printStackTrace() to see the contents of the stack trace.
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I m using



And the result is



Regards
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joanne Neal wrote:Use e.printStackTrace() to see the contents of the stack trace.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Isaac Ferguson wrote:I m using


I know. And that will just call the toString() method on the object returned by e.getStackTrace(). That object is an array of StackTraceElement instances and as arrays in java do not override the toString method it will use the toString method of the Object class.
As I said, use e.printStackTrace() instead.
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks the trace looks like this now



I have gone to the folder "crawler4jStorage" and I have given to it total permission but still I get it

Any idea?
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried closing Eclipse and restarting it ?
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I havew tried but it doesnt works
 
Joe Ess
Bartender
Posts: 9441
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried deleting the contents of the frontier folder, as the link posted by Joanne says to do?
Does the frontier folder exist before you execute the program?
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I have deleted the contents and also the folder. When I run the program it is created again, it doesn´t exists before I run the program

 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok the error is solved..... yes I killed manually all the instaces of Tomcat

And now it works, thanks
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!