• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

retrieve the HTML page of any URL without using java.net.URL

 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes. Any good suggestions in this regard will be highly appreciated... Thanks in advance...
 
Marshal
Posts: 22450
121
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.
 
Sheriff
Posts: 26776
82
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.



Why?

Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.
     
    Paul Clapham
    Sheriff
    Posts: 26776
    82
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.


    I'm pretty sure it doesn't use URLConnection; that caused problems for me when I tried to use it in an applet.
     
    salman khalid
    Greenhorn
    Posts: 9
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:

    salman khalid wrote:I want to develop a simple class that can fetch the HTML contents of a URL without using java.net.Url or java.net.UrlConnection classes.



    Why?

    Anyway the way to do that would be to write code which does this:

  • Extracts the name of the host from the URL
  • Extracts the port from the URL
  • Connects to that host on that port using a Socket
  • Using the HTTP protocol, sends a GET request to the host
  • Receives the response from the host and interprets it according to the HTTP protocol


  • You may find your requirement for "a simple class" conflicts with what actually has to be done. That's why I asked why you want to do this.



    thanks for the response...I agree with you that it will not be a simple class. I have implemented your suggested method. The following code snippet describes this method, but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.

    like "www.google.com" URL will return HTML contents but not "http://www.oracle.com/technetwork/java/index.html" URL.

     
    salman khalid
    Greenhorn
    Posts: 9
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Rob Prime wrote:Try Apache's HttpClient. But it may use URL in the background, I'm not sure about that.





    I will try Apache's HttpClient and then I will let you know....
     
    Rob Spoor
    Marshal
    Posts: 22450
    121
    Eclipse IDE Spring VI Editor Chrome Java Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Please UseCodeTags next time. It preserves indentation, and adds syntax highlighting. I've added them to your code, and you can see it's much easier to read now.
     
    Bartender
    Posts: 4179
    22
    IntelliJ IDE Python Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    salman khalid wrote:... but there is a problem in this approach and that is that it does not retrieve HTML contents when a URL contains the file path as well.



    You will have to properly format the get request. This URL may help:
    http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
     
    Rob Spoor
    Marshal
    Posts: 22450
    121
    Eclipse IDE Spring VI Editor Chrome Java Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Which is why I suggested HttpClient, as it will do all the hard work for you.
     
    eat bricks! HA! And here's another one! And a tiny ad!
    Thread Boost feature
    https://coderanch.com/t/674455/Thread-Boost-feature
    reply
      Bookmark Topic Watch Topic
    • New Topic