阻止搜索引擎和恶意蜘蛛爬虫访问

大量的蜘蛛爬虫访问会消耗服务器性能开销,更有工具类爬虫对网站进行渗透访问,给网站安全造成威胁,本文分享这些爬虫的 User-Agent 以及阻止方法。

现在大部分网站都使用CDN进行加速,建议直接在CDN设置 User-Agent 黑名单

阿里云全站加速 DCDN 设置方法如图所示,在图中填入 User-Agent

*dotbot*|*Go-http-client*|*CensysInspect*|*okhttp*|*MegaIndex*|*MegaIndex.ru*|*BLEXBot*|*Qwantify*|*qwantify*|*semrush*|*Semrush*|*serpstatbot*|*hubspot*|*python*|*Bytespider*|*Go-http-client*|*Java*|*PhantomJS*|*SemrushBot*|*Scrapy*|*Webdup*|*AcoonBot*|*AhrefsBot*|*Ezooms*|*EdisterBot*|*EC2LinkFinder*|*jikespider*|*Purebot*|*MJ12bot*|*WangIDSpider*|*WBSearchBot*|*Wotbox*|*xbfMozilla*|*Yottaa*|*YandexBot*|*Jorgee*|*SWEBot*|*spbot*|*TurnitinBot-Agent*|*mail.RU*|*Perl*|*Python*|*Wget*|*Xenu*|*ZmEu*

Cloudflare设置方法如图,若使用表达式生成器手动一个个添加将耗费太多时间,直接编辑表达式填入如下表达式即可

(http.user_agent contains "Go-http-client") or (http.user_agent contains "CensysInspect") or (http.user_agent contains "okhttp") or (http.user_agent contains "MegaIndex") or (http.user_agent contains "MegaIndex.ru") or (http.user_agent contains "BLEXBot") or (http.user_agent contains "Qwantify") or (http.user_agent contains "qwantify") or (http.user_agent contains "semrush") or (http.user_agent contains "Semrush") or (http.user_agent contains "serpstatbot") or (http.user_agent contains "hubspot") or (http.user_agent contains "python") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Go-http-client") or (http.user_agent contains "Java") or (http.user_agent contains "PhantomJS") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "Scrapy") or (http.user_agent contains "Webdup") or (http.user_agent contains "AcoonBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Ezooms") or (http.user_agent contains "EdisterBot") or (http.user_agent contains "EC2LinkFinder") or (http.user_agent contains "jikespider") or (http.user_agent contains "Purebot") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "WangIDSpider") or (http.user_agent contains "WBSearchBot") or (http.user_agent contains "Wotbox") or (http.user_agent contains "xbfMozilla") or (http.user_agent contains "Yottaa") or (http.user_agent contains "YandexBot") or (http.user_agent contains "Jorgee") or (http.user_agent contains "SWEBot") or (http.user_agent contains "spbot") or (http.user_agent contains "TurnitinBot-Agent") or (http.user_agent contains "mail.RU") or (http.user_agent contains "perl") or (http.user_agent contains "Python") or (http.user_agent contains "Wget") or (http.user_agent contains "Xenu") or (http.user_agent contains "ZmEu")

在本机装有WAF的情况,例如宝塔WAF,直接导入即可,格式是一行一个UA

可使用AI工具处理格式

在Nginx配置代码如下:

    if ($http_user_agent ~ "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ) {
        return 444;
    }