• Post Reply Bookmark Topic Watch Topic
  • New Topic

writing a Java code to get if any content changes from multiple URLs ?  RSS feed

 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Everyone,

I am writing my first program in Java.
Please guide me in writing a simple code to track multiple urls and get the modified content only if any.
I am working in eclipse and am playing around JSoup library.
From my investigation, it seems httpclient library may prove helpful but which is the jar file to be downloaded and added to the class path exactly ?

Well, I am confused and I know the 1st step is always the hardest but with your help I know I can overcome.

Regards,
Subu

 
Campbell Ritchie
Marshal
Posts: 56599
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

Forget all about which libraries you are going to use, and anything like that. Write down what you intend to do first.
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Welcome to the Ranch

Forget all about which libraries you are going to use, and anything like that. Write down what you intend to do first.


Thanks for looking into my question. My intention is as follows;
I have some URLs to track for any change whenever happens as these sites basically post the software updates which when comes up.
Some of these sites are;
http://www.adobe.com/support/downloads/product.jsp?product=10&platform=Windows [this is for Reader updates]
http://www.adobe.com/support/downloads/product.jsp?product=1&platform=Windows [this is for Acrobat updates]
https://get.adobe.com/shockwave/ [this is for shockwave player updates]
https://www.adobe.com/products/flashplayer/distribution3.html [this is for Flash Player updates]
https://get.adobe.com/air/ [this is for AIR updates but unless it's a minor release only the download size changes, the version changes only for major releases]
https://en.wikipedia.org/wiki/Adobe_AIR [may depend on this for AIR version as it gives all the minor versions aswell unlike the above link]

I would like to write a Java code to track the changes in these URLs whenever made.
The output file of mine can have only the changed content if any found, so I may need to run this code on the given URls periodically or everyday once in the morning automatically giving me a output.
Output file example: (<ScanDate>_update.log)
===============================
Reader Updates [YES]
-------------------------
Version 2015.016.20039
Version 11.0.16

Acrobat Updates [YES]
--------------------------
Version 2015.016.20039
Version 11.0.16

AIR Updates [NO]
Shockwave Player Updates [NO]

etc......
==========================

Along with the latest version available from the websites we can also write the latest download link names available below the versions specifically in case of Reader and Acrobat.
Once we get the output in this format we can download the latest updates from the latest links automatically and store it in a location.

Sorry, If I am asking for too much. But, please guide me and I will have a good path shown by you to walk on instead of getting confused.

Thanks a lot in advance!!! :-)


 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Subu me wrote:
From my investigation, it seems httpclient library may prove helpful but which is the jar file to be downloaded and added to the class path exactly ?


The httpclient library does indeed work well, but it isn't actually necessary. If you are willing to deal with a lower level API, the URL class (along with the URLConnection class), which are built into the core Java works fine. I did a few projects with these classes directly and with no issues.

Henry
 
Dave Tolls
Ranch Foreman
Posts: 3068
37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd consider (assuming those page's structures don't change) using a screen scraper (you already have JSoup) to pull the bit of data out you need, and compare with whatever value you currently have.
So you're not really talking about checking for page changes, but pulling specific value(s) out of the page.

Of course, for that you'll need to have a pretty good idea of which bits to pull out, and a way of actually doing that.
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried with the URL class and URLConnection class and able to download the whole content into a local file.

For a start, I am just considering one application to keep track of as we would need to have a slight different approaches for different application webpages to collect the desired data.
Let's first start with Adobe AIR updates. In this case I need to depend on 2 links as below;
1)https://get.adobe.com/air/ [the download size under the download now button]
2)https://en.wikipedia.org/wiki/Adobe_AIR [full version number which is displayed in the right pane against "Stable Release" and the date of release under the version]

So, only if the download size changes in the first webpage, we can go to the second webpage and get the release version and date and put it in a file and if possible download the latest installer aswell and rename the downloaded file name as AIR<full version>.exe

Appreciate investing your time to help!!
 
Dave Tolls
Ranch Foreman
Posts: 3068
37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So store only the information you need.
I really don't see the point in storing the entire page for a single piece of data that you'll have to extract from the page anyway.
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok.
For now let me deal with only if there is any change in the pages of urls as I mentioned earlier, then later I would start another thread to deal with JSoup extracting only the data I would need.
So just to summarize;
Need to have a code which would run automatically once a day against multiple urls and notify with customized statements w.r.t. each url, only if any page change occurs for those url/urls.
If the notification can be sent to a bunch of mail addresses, that would be great.

Any help with this idea please ?

Appreciate all your help!!!
 
Subu me
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
Subu me wrote:
From my investigation, it seems httpclient library may prove helpful but which is the jar file to be downloaded and added to the class path exactly ?


The httpclient library does indeed work well, but it isn't actually necessary. If you are willing to deal with a lower level API, the URL class (along with the URLConnection class), which are built into the core Java works fine. I did a few projects with these classes directly and with no issues.

Henry


Hi Henry,
Any help with lower API would be fine with me.
Please help me with this.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!