Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

cannot download everything in webpage via its url  RSS feed

 
wei liu
Ranch Hand
Posts: 35
Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear all,

I have try to download some webpages by using their URLs
belows is part of my codes that doing the downloading...




The problem I have is that I cannot seems to get every downloaded, some links like navigation bar etc were not downloaded.
I suspect something todo with the js or ajax technology the website use.

and the webpage I was trying to download is webpage

any idea is welcome!!

thanks in advance.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's an exceedingly rare webpage that doesn't include at least a links to a stylesheet, and yes, often to external Javascript files as well. Then of course there will be image files. To download a complete page with all the components, you need to parse the HTML enough to find those links, then download them into separate files.

But of course there are libraries that will do all of this for you; you shouldn't need to "reinvent the wheel"; for example, see the Apache HttpClient project.

 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!