• Post Reply Bookmark Topic Watch Topic
  • New Topic

using a crawler to invoke a google search & analyse google results  RSS feed

 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am really into java and software agents and wanted to focus my java coding on that. I wanted to code a crawler that could accept a search topic, invoke a Google search and analyze results. Based on a java crawler template I got online I edited the code and set up my own custom link analysis algorithms. My problem is the bit where the app interface accepts user text, then passing it to the Google engine and retrieving the Google results (I am designing it to be a stand-alone app or plugin).

Thanks
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What, specifically, are you having a problem with? What is or is not working as expected?
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not sure how (from a stand-alone app) input text can be passed to the Google engine and the results retrieved (the crawler will go through through the retrieved links). I am trying to avoid using a browser
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use the HttpClient library to pass the search query to Google and retrieve the result. You'll have to spend some time reverse-engineering the format of the search URL, though; it's not as simple as (e.g.) http://www.google.com/?q=jebediah+springfield.

You might also want to check if Google has a proper REST API for doing searches; for low search volumes it would probably be free to use.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
… and welcome to the Ranch
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am using the httpclient (4.x) library and I am trying to get it to return the search results but I keep getting an error.



And the error I receive is;

Fatal transport error: www.google.com
java.net.UnknownHostException: www.google.com
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source)
at java.net.InetAddress.getAddressesFromNameService(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:278)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:162)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:640)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1066)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1044)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1035)
at HttpClientTutorial.main(HttpClientTutorial.java:47)

when i try the url(http://www.google.com/search?q=batman&btnG=Google+Search&aq=f&oq=) in a browser, it displays the results the directly. I understand enough of the error to know that it is an issue with the source of the request but cant pin down what exactly.

Thanks

 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is your browser using a proxy? If so, you must use the same proxy with HttpClient as well.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!