• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Liutauras Vilda
Sheriffs:
  • Rob Spoor
  • Bear Bibeault
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh

using a crawler to invoke a google search & analyse google results

 
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am really into java and software agents and wanted to focus my java coding on that. I wanted to code a crawler that could accept a search topic, invoke a Google search and analyze results. Based on a java crawler template I got online I edited the code and set up my own custom link analysis algorithms. My problem is the bit where the app interface accepts user text, then passing it to the Google engine and retrieving the Google results (I am designing it to be a stand-alone app or plugin).

Thanks
 
Rancher
Posts: 43024
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What, specifically, are you having a problem with? What is or is not working as expected?
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not sure how (from a stand-alone app) input text can be passed to the Google engine and the results retrieved (the crawler will go through through the retrieved links). I am trying to avoid using a browser
 
Ulf Dittmer
Rancher
Posts: 43024
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could use the HttpClient library to pass the search query to Google and retrieve the result. You'll have to spend some time reverse-engineering the format of the search URL, though; it's not as simple as (e.g.) http://www.google.com/?q=jebediah+springfield.

You might also want to check if Google has a proper REST API for doing searches; for low search volumes it would probably be free to use.
 
Marshal
Posts: 72988
330
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
… and welcome to the Ranch
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks
 
Daniel Arnold
Greenhorn
Posts: 9
Eclipse IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am using the httpclient (4.x) library and I am trying to get it to return the search results but I keep getting an error.



And the error I receive is;

Fatal transport error: www.google.com
java.net.UnknownHostException: www.google.com
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source)
at java.net.InetAddress.getAddressesFromNameService(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:278)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:162)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:640)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1066)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1044)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:1035)
at HttpClientTutorial.main(HttpClientTutorial.java:47)

when i try the url(http://www.google.com/search?q=batman&btnG=Google+Search&aq=f&oq=) in a browser, it displays the results the directly. I understand enough of the error to know that it is an issue with the source of the request but cant pin down what exactly.

Thanks

 
Sheriff
Posts: 22209
117
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is your browser using a proxy? If so, you must use the same proxy with HttpClient as well.
 
Tomorrow is the first day of the new metric calendar. Comfort me tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic