Wednesday, January 10, 2018

Proxy Blocking Abot Spider on Windows

I like Abot Web Crawler. It's written in C# and easy to use. Abot will crawl a website so you can get a site map for analysis. Our company's proxy blocks Abot from working. The solution is simple for windows, just set the HTTP_PROXY environmental variable like this,
set HTTP_PROXY=http://proxy_userid:proxy_password@proxy_addr:proxy_port
For example:
set HTTP_PROXY=http://katniss.everdeen:iMissRue@proxy.district9.gov:80
Also also need to set the proxy for HTTPS_PROXY. Then you can run the spider:
cd c:\Git\Abot\abot-master\Abot.Demo\bin\Debug
Abot.Demo.exe https://www.cnn.com > cnn.txt

No comments: