I like
Abot Web Crawler. It's written in C# and easy to use.
Abot will crawl a website so you can get a site map for analysis.
Our company's proxy blocks Abot from working. The solution is simple for windows, just set the HTTP_PROXY environmental variable like this,
set HTTP_PROXY=http://proxy_userid:proxy_password@proxy_addr:proxy_port
For example:
set HTTP_PROXY=http://katniss.everdeen:iMissRue@proxy.district9.gov:80
Also also need to set the proxy for HTTPS_PROXY.
Then you can run the spider:
cd c:\Git\Abot\abot-master\Abot.Demo\bin\Debug
Abot.Demo.exe https://www.cnn.com > cnn.txt
No comments:
Post a Comment