This week's book giveaway is in the Jython/Python forum.
We're giving away four copies of Murach's Python Programming and have Michael Urban and Joel Murach on-line!
See this thread for details.
Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

cannot download everything in webpage via its url  RSS feed

 
wei liu
Ranch Hand
Posts: 35
Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear all,

I have try to download some webpages by using their URLs
belows is part of my codes that doing the downloading...




The problem I have is that I cannot seems to get every downloaded, some links like navigation bar etc were not downloaded.
I suspect something todo with the js or ajax technology the website use.

and the webpage I was trying to download is webpage

any idea is welcome!!

thanks in advance.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24215
37
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's an exceedingly rare webpage that doesn't include at least a links to a stylesheet, and yes, often to external Javascript files as well. Then of course there will be image files. To download a complete page with all the components, you need to parse the HTML enough to find those links, then download them into separate files.

But of course there are libraries that will do all of this for you; you shouldn't need to "reinvent the wheel"; for example, see the Apache HttpClient project.

 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!