Web-spider

A web spider, also called a web crawler or bot, is a program that automatically browses the internet and collects information from websites.

A simple robot follows a straightforward process:

  • Visits a webpage

  • Reads its content

  • Follows links to other pages

  • Repeats the process over and over

Web-spiders collect information such as the content, for search engines or AI, related links, for SEO ranking, or security.

Web-spiders should limit their visits according to the robots.txt file, available at the top of the website. Some of them don’t respect it, or even exploit it to their own advantage.

Documentation

See also What is a web crawler? | How web spiders work.

Related : Webscraping, robots.txt, sitemap, Webserver, CAPTCHA