• Post Reply Bookmark Topic Watch Topic
  • New Topic

program to download (song)links from a page  RSS feed

 
Ranch Hand
Posts: 40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
There is this website which I frequent and download songs from. Is it possible to download all the songs from the page using a program ?
What concepts should I use to be able to do this ? I have heard of terms like GET, POST etc, but I haven't worked on any of these.

What I would do
When I see the page source, I see links like <a href="http://songs1.download.com/music/Singles/NameOfSong.mp3
So maybe I would just do a wget or curl on this link.
But is there any WEB CONCEPT I can use for this? As in, I would like to replace wget/curl with something else. Where would a developer use GET/POST for this assignment?

Output using wget


Output using curl


Thanks
 
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aftab Hassan wrote:There is this website which I frequent and download songs from. Is it possible to download all the songs from the page using a program ?

I'm sure there is. Indeed, your wget output suggests that you've proved the concept in principle.

My worry is more:
1. It sounds like a "Kleenex" (or WORO - write once, run once) project because, once you've downloaded all the songs, then what do you do?
2. How do you know which links are songs and which aren't? If they're direct, you could probably use the extension; but not all download pages are that accommodating.
3. I think you could achieve something similar (if a bit less automated), by writing a program that simply takes a piece of HTML source and spits out all the links it contains. You could then pipe those to a script that runs wget on them. And even if you do end up with a "project", I suspect a module like that would be at its heart anyway.

HIH

Winston
 
Aftab Hassan
Ranch Hand
Posts: 40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:[quote=Aftab Hassan
How do you know which links are songs and which aren't? If they're direct, you could probably use the extension; but not all download pages are that accommodating.

Hi Winston, thanks for the reply.
As you said, I would trust the extension to tell me which are song links. It's just for hobby, and to teach me some concepts beyond wget and parsing HTML, which I can put to better use later. So it's okay, if I don't download all songs, or happen to download from a link which is not an mp3 file.
1. But can you tell me where I can use some web concept like HTTP GET/POST to achieve the same ?
2. How would a web crawler do it's job ? Don't they do more than just parsing HTML sources and doing a wget ? Can you give me some insight on this ?


I'm sorry, I understand the two questions above sound a bit confused, but I'm looking for a start and then I can start googling.

Thanks
 
Aftab Hassan
Ranch Hand
Posts: 40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
It sounds like a "Kleenex" (or WORO - write once, run once) project because, once you've downloaded all the songs, then what do you do?

Winston, if my previous reply is a bit ambiguous, let me change the question slightly. Keep in mind that my aim is to use GET/POST somewhere really
Should I be using GET/POST if want to say, search for a particular word on google from a program ?

In this case, assume I'm doing something like : I run the google search, "download songs of Captain America" through my program which uses a GET/POST.
This would most probably return the site which I frequent in the list of retrieved google searches, along with the exact URL pointing to a page having all the songs of that movie I searched for .
I can then use the parsing/wget combination as we discussed earlier to download all the songs from that page.

This also takes away the WORO as you mentioned, because the argument to my program could be maybe the movie name which should be appended to. "download songs of " in the google search.

Thanks.

 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aftab Hassan wrote:In this case, assume I'm doing something like : I run the google search, "download songs of Captain America" through my program which uses a GET/POST.

I'm afraid I'm no expert on this, but there's two things I'd advise:

1. You seem to be intent on using GET/POST for this. Can you explain why? It seems to me that you should use the best tool for the job, rather than deciding how you're going to do something before you know what is required. As one of the other posters on here is fond of saying: "You don't go to a building site saying 'I'm going to use a saw' when what you need is a hammer". You might want to read the WhatNotHow (←click) page for more information.

2. Try not to do too much at once. It sounds to me like you want a fully automated program to do everything at the touch of a button, and that sounds like a lot of work (and thinking) to take on all at once.
If I was going to tackle this, I'd start out by writing modules that I know are going to be needed first - like one to take a piece of HTML and spit back all its links, maybe as Strings. Then I'd write a module that filters those links by extension (or whatever you need). Then another to take those links and launch wget (or whatever) on them. Then maybe another to place the resulting downloads in target directories...

Do you see what I mean? If you tackle this project a bit at a time, you can concentrate on one problem at a time, test it, and make sure it works before going on to the next task. And at the end, your "automation bit" will simply be a matter of tying these modules together under a single button (or GUI; or webpage...).

HIH

Winston
 
Aftab Hassan
Ranch Hand
Posts: 40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
You seem to be intent on using GET/POST for this. Can you explain why? It seems to me that you should use the best tool for the job, rather than deciding how you're going to do something before you know what is required.


Thanks for the reply Winston. The thing is, I have never worked with GET/POST but understand that it's something really powerful. In fact, this whole project(about song download etc) was designed to teach myself where I can use GET/POST.

Let's forget the song download for now. In this online thesaurus site, can you tell me how I can search for synonyms of a word which is entered by the user. Assuming, I am looking for synonyms of the word, "equivocal", it's URL is http://thesaurus.com/browse/equivocal .Is there a way to take me to this page without appending the 'equivocal' at the end of the URL and by way of a GET/POST ?
Sorry to bring up this topic again.
 
Sheriff
Posts: 22846
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Perhaps your obsession with the GET and POST methods might be cured by finding out more about them? Let's try a Wikipedia article: http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!