Information. Integration. Distribution: Controlling the crawlers

Sunday, January 28, 2007

Controlling the crawlers

robots.txt is a standard file that any webmaster can put up in his/her web directory which would contain instructions for any web crawler to control which pages (or parts of pages) should/should not be indexed.

Google has started an interesting series of posts on how to use robots.txt and control the Googlebot itself.

This is a comprehensive list of all the web robots out there. Whats even more interesting to note is that the list contains almost 300 web crawlers which are crawling our sites everyday.

1 Comments:

Anonymous said...: LcgOxjBriBts [url=http://adidas51.webnode.jp/]nike エア[/url]GboVpsPsuJcm [url=http://nikeonline.blog.fc2blog.net/]nike id[/url]FwfVfwVrvCeg [url=http://nikeair350.blog.fc2.com/]ナイキシューズ[/url]PtnUnlUkqEij [url=http://nikeshose.blog.fc2.com/]nike スニーカー[/urlMhkJwqTkuNre [url=http://nikeonlie11.blog.fc2blog.net/]ナイキ[/url]JtlHoyXeuOtp [url=http://ナイキシューズ.seesaa.net/]ナイキフリー[/url]UsfZalGhiXnz [url=http://シューズナイキ.seesaa.net/]nike[/url]DedMaaEccWuf [url=http://nikeair11.seesaa.net/]スニーカーナイキ[/url]WsbOllOsqByx [url=http://niker.seesaa.net/]スニーカー nike[/url] TlpVkuEqdGao [url=http://nikeshose11.blog.fc2.com/]ナイキフリー[/url]VmlSipRysKbt; 4:43 AM

Information. Integration. Distribution

Sunday, January 28, 2007

Controlling the crawlers

1 Comments:

About Me

Previous Posts