howtosumo.com
robots.txt

Robots Exclusion Standard data for howtosumo.com

Resource Scan

Scan Details

Site Domain howtosumo.com
Base Domain howtosumo.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2024-08-28T17:17:10+00:00
Next Scan 2024-11-26T17:17:10+00:00

Last Successful Scan

Scanned2024-02-01T16:25:32+00:00
URL https://howtosumo.com/robots.txt
Domain IPs 104.21.94.105, 172.67.222.92, 2606:4700:3031::ac43:de5c, 2606:4700:3032::6815:5e69
Response IP 104.21.94.105
Found Yes
Hash 5b0e0b30f9018e6677ffa85bf09284a25ff2fc1a8bd8822ec83c2c738f2b9f15
SimHash 6c5041c9c5f7

Groups

anthropic-ai

Rule Path
Disallow /

archive.org

Rule Path
Disallow /api.php
Disallow /index.php
Disallow /Special%3A

ccbot

Rule Path
Disallow /

doc

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

fetch

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

hmse_robot

Rule Path
Disallow /

httrack

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

linko

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

npbot

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webzip

Rule Path
Disallow /

wget

Rule Path
Disallow /

xenu

Rule Path
Disallow /

zao

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

adsbot-google

Rule Path
Allow /

mediapartners-google

Rule Path
Allow /

googlebot

Rule Path
Allow /Special%3ANewPages
Allow /Special%3ASitemap
Allow /Special%3ACategoryListing
Allow /

*

Rule Path
Allow /Special%3ABlock
Allow /Special%3ABlockList
Allow /Special%3ACategorylisting
Allow /Special%3ACategoryListing
Allow /Special%3ACharity
Allow /Special%3AEmailUser
Allow /Special%3ALSearch
Allow /Special%3ANewPages
Allow /Special%3AQABox
Allow /Special%3ASearchAd
Allow /Special%3ASitemap
Allow /Special%3AThankAuthors
Allow /Special%3AUserLogin
Allow /index.php?*action=credits
Allow /index.php?*MathShowImage
Allow /index.php?*printable
Disallow /index.php
Disallow /*feed%3Drss
Disallow /*action%3Ddelete
Disallow /*action%3Dhistory
Disallow /Special%3A
Disallow /*platform%3D
Disallow /*variant%3D

Comments

  • robots.txt for https://www.wikihow.com
  • based on wikipedia.org's robots.txt
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Sitemap: https://www.wikihow.com/sitemap_index.xml
  • If your bot supports such a thing using the 'Crawl-delay' or another
  • instruction, please let us know. We can add it to our robots.txt.
  • Friendly, low-speed bots are welcome viewing article pages, but not
  • dynamically-generated pages please. Article pages contain our site's
  • real content.
  • Requests many pages per second
  • http://www.nameprotect.com/botinfo.html
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • wget in recursive mode uses too many resources for us.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance. Please wait 3 seconds between each request.