• Post Reply Bookmark Topic Watch Topic
  • New Topic

Only read a URL page if it has been modified (Condition to use) ?  RSS feed

 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I want to read certain contents from a URL page only if it has been modified.
So what should be the exact condition I need to put for my IF block ?
So far this is what I got as a basic using ifModfiedSince



Thanks in advance!
 
Tim Moores
Saloon Keeper
Posts: 3512
77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You would need to store the time when you last downloaded the page, so that you can set that as IfModifiedSince value the next time you check the URL.
 
fred rosenberger
lowercase baba
Bartender
Posts: 12443
42
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
that is the ultimate question...modified since when?  In the past hour?  the past week? the past year?  Since the last time you read it?

regardless of what the answer is, you have to either a) keep track of the previous time, or b) have a way to query for the previous time. You also need to decide what to do if you don't have a previous time - I assume it would be 'read the page', but that's really your decision.
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
fred rosenberger wrote:that is the ultimate question...modified since when?  In the past hour?  the past week? the past year?  Since the last time you read it?

regardless of what the answer is, you have to either a) keep track of the previous time, or b) have a way to query for the previous time. You also need to decide what to do if you don't have a previous time - I assume it would be 'read the page', but that's really your decision.


I just edited the code and now I have set the ifModifiedSince to 24 hrs ago.
So, now please tell me how to get a condition w.r.t. if any page change in last 24 hrs.
I would be running the code once in every 24 hrs.



Thanks!
 
Tim Moores
Saloon Keeper
Posts: 3512
77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There would be no condition - you would open the connection in either case. But if the resource hadn't been modified, the response code (which you can get at by casting the connection to an HttpUrlConnection) would indicate that, possibly using an HTTP_NOT_MODIFIED code.
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Moores wrote:But if the resource hadn't been modified, the response code (which you can get at by casting the connection to an HttpUrlConnection) would indicate that, possibly using an HTTP_NOT_MODIFIED code.


I am not aware of this as you are saying ?
How to utilize the HttpUrlConnection  ?
Can you please explain me how and where I can include that in the code please using HTTP_NOT_MODIFIED ?

Because I am reading certain contents from page so I need to have a check before the code to read no matter how.

Appreciate any help!
 
Ron McLeod
Saloon Keeper
Posts: 1432
207
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's are a couple of examples to show how If-Modified-Since works.  When you send your request, include a timestamp in the If-Modified-Since header.  If the page has not been modified since then, you will get a response with a result code 304, indicating that the page has not been modified.  If the page has been modified since then, you will get a response with a result code 200, and a Last-Modified header indicating the last time it was modified (in addition to the contents for the page).

Page has not been modified



Page has been modified


Not all sites and site pages will support this, and the page you are checking,  https://get.adobe.com/air/, appears like it does not.

Response for get.adobe.com/air/
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not all sites and site pages will support this, and the page you are checking,  https://get.adobe.com/air/, appears like it does not.

Response for get.adobe.com/air/


Thanks for a so much of a detailed answer !!!
Air page was just an example. I would be basically dealing with multiple webpages.
So basically how do I check if a page has the support for ifModifiedSince. Do I need to get hold of some return value and check ?
And if a page supports, then how can I get hold of these 304 or 200 return values so that I can use the check in my If block and proceed further for reading ?

Thanks again for your detailed explanation!

 
Tim Moores
Saloon Keeper
Posts: 3512
77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you go back to my 2nd post, you'll find that I mentioned casting the connection object to another class - did that make sense? The method to get the response code is called (no surprise there) getResponseCode.
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Moores wrote:If you go back to my 2nd post, you'll find that I mentioned casting the connection object to another class - did that make sense? The method to get the response code is called (no surprise there) getResponseCode.


By just these 2 lines I can get hold of the response code;

But, an AIR page always returns 200 and the site doesn't support the ifModifiedSince concept.
So, every time I would get a status of 200 even if the page has not been modified. Correct ? (Please correct me If I am wrong here)

Hence, looking for a solution as how to exactly know if the site supports ifModifiedSince standard as in my code above I am setting the ifModifiedSince to 24hrs earlier and would run the script every 24 hrs.

Please if anyone can give some tweak to my code above with a working example using any site as you wish to. Or may be a sample code as you wish with a working example on a webpage where we can
get some kind of notification or output if any change in the page from last 24hrs or anyway you propose.
Am just beginning here with java :-) so please consider me to learn a bit slower.

Please let me know, if I need to take a different approach to get this done ?

Appreciate all your help here!!
Regards.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!