大量的蜘蛛爬虫访问会消耗服务器性能开销,更有工具类爬虫对网站进行渗透访问,给网站安全造成威胁,本文分享这些爬虫的 User-Agent 以及阻止方法。
现在大部分网站都使用CDN进行加速,建议直接在CDN设置 User-Agent 黑名单
阿里云全站加速 DCDN 设置方法如图所示,在图中填入 User-Agent
*dotbot*|*Go-http-client*|*CensysInspect*|*okhttp*|*MegaIndex*|*MegaIndex.ru*|*BLEXBot*|*Qwantify*|*qwantify*|*semrush*|*Semrush*|*serpstatbot*|*hubspot*|*python*|*Bytespider*|*Go-http-client*|*Java*|*PhantomJS*|*SemrushBot*|*Scrapy*|*Webdup*|*AcoonBot*|*AhrefsBot*|*Ezooms*|*EdisterBot*|*EC2LinkFinder*|*jikespider*|*Purebot*|*MJ12bot*|*WangIDSpider*|*WBSearchBot*|*Wotbox*|*xbfMozilla*|*Yottaa*|*YandexBot*|*Jorgee*|*SWEBot*|*spbot*|*TurnitinBot-Agent*|*mail.RU*|*Perl*|*Python*|*Wget*|*Xenu*|*ZmEu*
data:image/s3,"s3://crabby-images/ca446/ca446bf079a87d9500709cd8998f12c517b9ad66" alt="全站加速 DCDN 设置 User-Agent黑/白名单 全站加速 DCDN 设置 User-Agent黑/白名单"
Cloudflare设置方法如图,若使用表达式生成器手动一个个添加将耗费太多时间,直接编辑表达式填入如下表达式即可
(http.user_agent contains "Go-http-client") or (http.user_agent contains "CensysInspect") or (http.user_agent contains "okhttp") or (http.user_agent contains "MegaIndex") or (http.user_agent contains "MegaIndex.ru") or (http.user_agent contains "BLEXBot") or (http.user_agent contains "Qwantify") or (http.user_agent contains "qwantify") or (http.user_agent contains "semrush") or (http.user_agent contains "Semrush") or (http.user_agent contains "serpstatbot") or (http.user_agent contains "hubspot") or (http.user_agent contains "python") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Go-http-client") or (http.user_agent contains "Java") or (http.user_agent contains "PhantomJS") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "Scrapy") or (http.user_agent contains "Webdup") or (http.user_agent contains "AcoonBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Ezooms") or (http.user_agent contains "EdisterBot") or (http.user_agent contains "EC2LinkFinder") or (http.user_agent contains "jikespider") or (http.user_agent contains "Purebot") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "WangIDSpider") or (http.user_agent contains "WBSearchBot") or (http.user_agent contains "Wotbox") or (http.user_agent contains "xbfMozilla") or (http.user_agent contains "Yottaa") or (http.user_agent contains "YandexBot") or (http.user_agent contains "Jorgee") or (http.user_agent contains "SWEBot") or (http.user_agent contains "spbot") or (http.user_agent contains "TurnitinBot-Agent") or (http.user_agent contains "mail.RU") or (http.user_agent contains "perl") or (http.user_agent contains "Python") or (http.user_agent contains "Wget") or (http.user_agent contains "Xenu") or (http.user_agent contains "ZmEu")
data:image/s3,"s3://crabby-images/7fec9/7fec9f6ff50c62f88853488c18b92550b0bfc840" alt="Cloudflare编辑表达式阻止UA Cloudflare编辑表达式阻止UA"
在本机装有WAF的情况,例如宝塔WAF,直接导入即可,格式是一行一个UA
data:image/s3,"s3://crabby-images/5509d/5509d5888aa1847cf8b08bb0852f96d9eb4bd059" alt="宝塔WAF设置UA黑名单 宝塔WAF设置UA黑名单"
可使用AI工具处理格式
data:image/s3,"s3://crabby-images/18522/18522b4a1090a0f70e2b73f9a6f2cff165b3f249" alt="AI处理文本格式 AI处理文本格式"
在Nginx配置代码如下:
if ($http_user_agent ~ "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ) {
return 444;
}
data:image/s3,"s3://crabby-images/afb02/afb0261ae061e1170fd797b007625944c7450bf9" alt="在Nginx配置禁止UA访问 在Nginx配置禁止UA访问"