很多时候,我们都希望网站被大部分的搜索引擎抓取,以此来获取更多流量,实现价值,但是不少小型站点因为不可预知的原因导致大量搜索引擎蜘蛛出啊去网站,势必会暂用很大流量 […]
很多时候,我们都希望网站被大部分的搜索引擎抓取,以此来获取更多流量,实现价值,但是不少小型站点因为不可预知的原因导致大量搜索引擎蜘蛛出啊去网站,势必会暂用很大流量,如下所示:
我们一般可以在网站的访问日志里看到蜘蛛的爬行记录,如果蜘蛛爬行过多,会造成网站服务器崩溃,影响正常用户的体验。于是,我们需要对一些无用的搜索引擎蜘蛛进行封禁,禁止其爬取我们的网站,余斗一般不建议封禁国内的主流搜索引擎蜘蛛,常见的几种搜索引擎蜘蛛如下:
google蜘蛛:googlebot
百度蜘蛛:baiduspider
yahoo蜘蛛:slurp
alexa蜘蛛:ia_archiver
msn蜘蛛:msnbot
bing蜘蛛:bingbot
altavista蜘蛛:scooter
lycos蜘蛛:lycos_spider_(t-rex)
alltheweb蜘蛛:fast-webcrawler
inktomi蜘蛛:slurp
有道蜘蛛:YodaoBot和OutfoxBot
热土蜘蛛:Adminrtspider
搜狗蜘蛛:sogou spider
SOSO蜘蛛:sosospider
360搜蜘蛛:360spider
Linux下 规则文件.htaccess(手工创建.htaccess文件到站点根目录):
<IfModule mod_rewrite.c>
RewriteEngine On
#Block spider
RewriteCond %{HTTP_USER_AGENT} "Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu" [NC]
RewriteRule !(^robots\.txt$) - [F]
</IfModule>
windows2003下修改规则文件httpd.conf(在虚拟主机控制面板中用“ISAPI筛选器自定义设置 ” 开启自定义伪静态 Isapi_Rewite3.1):
#Block spider
RewriteCond %{HTTP_USER_AGENT} (Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu) [NC]
RewriteRule !(^/robots.txt$) - [F]
windows2008下修改根目录配置文件web.config:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<rewrite>
<rules>
<rule name="Block spider">
<match url="(^robots.txt$)" ignoreCase="false" negate="true"/>
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|curl|perl|Python|Wget|Xenu|ZmEu" ignoreCase="true"/>
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden"/>
</rule>
</rules>
</rewrite>
</system.webServer>
</configuration>
注:规则中默认屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按规则添加即可,对照修改代码中Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu部分来增删自己要封禁的蜘蛛即可。