Win a copy of Murach's Java Programming this week in the Beginning Java forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

I have to download a individual item to a local storage from webpage in java  RSS feed

 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had used HTTP utility downloaded but it doesn't work for me its saves view source of the page . But I want to download an individual item from the web page
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This sounds like a previous question you asked. What, exactly, does your code do now, and where are stuck making the required changes?
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
no its not like previous question for ex this is a site http://epaper.deccanchronicle.com/states.aspx#page1 of epaper i basically want to download epapers from that locally
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have a look at HtmlUnit - IMO it's the best Java library for programmatic web access, and it's easy to deal with images with it.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
k i will check that thanks
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had used HTML editor kit
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
in this code im able to extract all URL in which image is there but it's not saving in local storage there is no error so please help
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do you know there is no error if you suppress the IOException?

How have you ascertained that the URL is correct?

I stand by my advice to use HtmlUnit.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i had tried to use htmlunit but its not working
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i had tried to use htmlunit but its not working

So instead of trying to figure out why it's not working you just give up and look for another solution? The way to get help here on the Ranch is to tell us what you tried, post the relevant code, and explain what happened when you ran it.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
k here is the code
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
errors:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/httpclient/auth/CredentialsProvider
at htmlunit.HtmlUnitExampleTestBase.main(HtmlUnitExampleTestBase.java:51)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.auth.CredentialsProvider
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
 
Dave Tolls
Ranch Hand
Posts: 2721
30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The HTMLUnit Getting Started page has link at the top to a list of its dependencies (as well as the dependencies of those dependencies).  You'll need all the compile time ones (not the test ones).

Now, there are a lot there.  Are you using something like Maven or Gradle to handle dependencies?
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, the library needs all the jar files that come in the lib directory of the download.

catch (ElementNotFoundException | FailingHttpStatusCodeException | IOException e) {
}

I cringe every time I see exceptions suppressed like that. Do people imagine that code is likely to recover from it? There should at least be a printout of the error message, even better the full stacktrace.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
no im not using maven or gradle
 
Dave Tolls
Ranch Hand
Posts: 2721
30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Moores wrote:Yes, the library needs all the jar files that come in the lib directory of the download.


Ah, well that'll be a lot simpler then.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yes it is but im getting the error  is there any way to access cache of webpage in java
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Niti Kapoor wrote:yes it is but im getting the error

The one you mentioned earlier? How are adding all the required jar files to the classpath now? Post the exact command you're entering on the command line to compile and to run the code.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
im running in netbeans
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You still need to add all jar files to both the compile and runtime classpaths.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yes im adding all jars this is error

run:
Jul 29, 2017 10:08:13 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 29, 2017 10:08:13 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:14 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:14 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:15 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:15 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:15 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:16 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:16 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:16 AM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument open
WARNING: Ignoring call to open() during the parsing stage.
Jul 29, 2017 10:08:17 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:17 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:17 AM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[TypeError: object is not iterable (net.sourceforge.htmlunit.corejs.javascript.NativeObject)] sourceName=[https://tpc.googlesyndication.com/pagead/js/r20170726/r20110914/client/ext/m_js_controller.js] line=[1] lineSource=[null] lineOffset=[0]
Jul 29, 2017 10:08:17 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:17 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:17 AM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument close
WARNING: close() called when document is not open.
Jul 29, 2017 10:08:18 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:18 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:19 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:19 AM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[TypeError: object is not iterable (net.sourceforge.htmlunit.corejs.javascript.NativeObject)] sourceName=[https://tpc.googlesyndication.com/pagead/js/r20170726/r20110914/client/ext/m_js_controller.js] line=[1] lineSource=[null] lineOffset=[0]
Jul 29, 2017 10:08:19 AM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument close
WARNING: close() called when document is not open.
Jul 29, 2017 10:08:19 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 29, 2017 10:08:19 AM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
BUILD SUCCESSFUL (total time: 9 seconds)
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
http://onlineepaper.asianage.com/asianage-epaper.aspx?id=LON this url in id have cache in it so first i need to access this cache
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't understand what you're saying about a "cache" (maybe you can rephrase it). What you posted aren't errors, just warnings. Your code ran successfully. Of course, if you're still suppressing exceptions, who knows what might have happened?
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks tim for Htmlunit   the warning error resolved by just printing page.asXml()
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
how to download image which im getting in html and js after htmlunit
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
and browser image cache I want to access because there are thousand of images.
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If the image has an ID, then you have already posted the relevant code earlier. If not, you will need to concoct an XPath expression to get at it (HtmlPage has a method for that). Once you have an HtmlImage, that has a method for storing to a file.

HtmlUnit has nothing to do with any actual browser you may have been using, so you need to come up with some other way to access their respective caches. That's going to vary between browsers, and likely won't be as easy as downloading via HtmlUnit.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
so other than htmlunit how to access cache of particular website  to download a image
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What do you mean by "cache of particular website"? A browser's cache? If so, which browser? What do you hope to achieve by that?

As I said, HtmlUnit would have nothing to do with that.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
means if I had download one image then I don't have to download it again
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
im making an software of newspaper clippings download
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"download" how? Are you talking about the functionality you will implement with HtmlUnit, or about something else? Please make it easy for us to help you by providing lots of details, it's a bit tiring to have to drag it out of you.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
download with htmlunit functionality only but like if i had download once i dont have to download it again
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then you should store the images in such a way that you can quickly look up which ones are there already, and which ones are not. Typically one would use a DB for this, but I think in this case, the directory tree where you store them would work just as well. The path of the images on the web site could be mirrored by the path where they are stored in the file system, giving you a straightforward way to store them, and to check whether they've been downloaded already.
 
Niti Kapoor
Ranch Hand
Posts: 96
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
so for this i have to write a condition or there is any other way
 
Tim Moores
Saloon Keeper
Posts: 3755
78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know what you mean by "condition" - it will require coding on your part, yes.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!