Hi all, I am Chaitanya, I want to know how google gets the links of a certain search. Suppose if I search about "cat", all the related websites about cats are displayed. How does google knows about the related websites? One more doubt regarding the same, suppose if I search the same from yahoo search engine also, I get almost same results. How is this done exactly?
Finally I came to know how this works after reading this article and discussing with my friend.
I will explain what I understood, please tell me if I miss anything or if I am wrong any where.
Suppose that there is a website and let the domain is from yahoo. Yahoo asks the website owner whether to submit the site to the popular search engines. While submitting you will be asked to enter key words. These key words will be used as search keys. Not only yahoo does this, everyone who sells domain does the same. In this case its yahoo.
When you hit submit a request is sent to all the popular search engines chosen. Each and every search engines runs few programs called as spiders. These spiders will read the requests and get into the sites and will download all the static pages to their discs and will give an index to each and every key word. Will also store the address of the webpage, from where it is downloaded. This process is called as web crawling. Don't worry, crawling will not be done the entire day. The crawling process will be scheduled when to run. Many search engines run their spider programs in the night time because the traffic will be low.
From the next search onwards your site is also included in the searching process.
Suppose you now have searched for "Why main in java is static?" Now the search engine algorithms will search their file systems, search the downloaded pages whose key is "Why main in java is static?", extracts the associated web site addresses, then build a web page consisting all the links, then sends the page to the user. The user now based on his interest clicks on any link, the he will be redirected to the particular site and respective page.
Note: The web pages will or any thing the spider programs download, are not saved in a database. All information is saved in flat files. Because searching a database takes more time searching the file system.
Each and every search engine employees its won spider programs. Google has its own disc space to store all the static files. Whereas yahoo does not have its own disc space. Yahoo depends on other organization (I think netlap or netapp or may be another) to run searching programs. Those organizations will do the web crawling process and yahoo just uses their discs, searches it and builds a web page consisting of many links.
Please tell me if I am wrong or if I miss anything. Thank you all in advance.