robots.txt¶
The robots.txt file is a simple text file placed on a website’s root directory. It tells web-spiders, like search engine bots, which pages or sections of the site they are allowed or disallowed to access.
robots.txt controls bot traffic to avoid overloading servers or indexing private/irrelevant pages.
robots.txt tends to leak path where a robot should not index content, which is usually a place where hackers tries to login.
Related : Webscraping, Web-spider, Search Engine
Related packages : fkrzski/robots-txt