Web-spider¶
A web spider, also called a web crawler or bot, is a program that automatically browses the internet and collects information from websites.
A simple robot follows a straightforward process:
Visits a webpage
Reads its content
Follows links to other pages
Repeats the process over and over
Web-spiders collect information such as the content, for search engines or AI, related links, for SEO ranking, or security.
Web-spiders should limit their visits according to the robots.txt file, available at the top of the website. Some of them don’t respect it, or even exploit it to their own advantage.
See also What is a web crawler? | How web spiders work.
Related : Webscraping, robots.txt, sitemap, Webserver, CAPTCHA