Win a copy of Head First Agile this week in the Agile forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Newbie in java.net.*  RSS feed

 
Ashish Hareet
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just trying to learn this API. For my sake I want to do the following
1. Read the entire contents of some URL
2. Extract all the hyperlinks
3. Save the contents of the URL to the local file system.
Here's how I did it-
1. Use a new URL("whatever thinghy").openStream()
2. Use a HTMLeditorKit...etc to do the parsing
Now I have the following questions
1. Can I be sure that the URL object will always return the entire contents of a remote URL - is there a possibility of losing some packets.
2. How can I save the contents of a web page to the local file system which should include, pics,flash , etc - much like a regular browser.
3. Is my approach fine or is there a better way to do this (Sockets, etc.)
Any help appreciated
Ashish H.
 
Raghav Mathur
Ranch Hand
Posts: 641
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi ashish
The approach is fine for making HTTP url connections but for other protocols i think the url class doesn't provide much of functionality .
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ashish Hareet:
1. Can I be sure that the URL object will always return the entire contents of a remote URL
Yes, unless there is a fatal error in the connection. The TCP protocol ensures a reliable connection by detecting and retransmitting any lost or corrupted packets.
How can I save the contents of a web page to the local file system which should include, pics,flash , etc - much like a regular browser.
AFAIK there is no easy way to do this. You can only do this by actually parsing the HTML, recursively grabbing all resources it refers to, and updating the links so it will work on the filesystem. Worse, if some extension or plug-in comes along that you had not anticipated it'll break.
Is my approach fine or is there a better way to do this (Sockets, etc.)
Your approach seems fine. If you want to expand upon the functionality you could use a specialised HTML parser rather than the editor kit.
- Peter
 
Ashish Hareet
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanx Raghav & Peter
Yes, unless there is a fatal error in the connection.....

What happens if the connection is closed(i.e. when the modem is cut out) while I'm parsing the data - do I get a error or the parsed data is simply returned.
And what happens if the connection to the site gets timed out (like it does in a regular browser).
Excuse my ignorance if any
Thanx
Ashish H.
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In all cases, a fatal error in the TCP connection causes an IOException to be thrown by the InputStream. I would expect any parser to propagate this exception to the calling code, either as a raw IOException or perhaps wrapped in some kind of parsing exception.
- Peter
 
Ashish Hareet
Ranch Hand
Posts: 375
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanx Peter
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!