Saturday, January 27, 2007

Official Google Blog: Controlling how search engines access and index your website

"Search engines like Google read through all this information and create an index of it. The index allows a search engine to take a query from users and show all the pages on the web that match it.

In order to do this Google has a set of computers that continually crawl the web. They have a list of all the websites that Google knows about and read all the pages on each of those sites. Together these machines are known as the Googlebot. In general you want Googlebot to access your site so your web pages can be found by people searching on Google.

However, you may have a few pages on your site you don't want in Google's index. For example, you might have a directory that contains internal logs, or you may have news articles that require payment to access. You can exclude pages from Google's crawler by creating a text file called robots.txt and placing it in the root directory. The robots.txt file contains a list of the pages that search engines shouldn't access. Creating a robots.txt is straightforward and it allows you a sophisticated level of control over how search engines can access your web site.


Post a Comment

<< Home