patch.com
robots.txt

Robots Exclusion Standard data for patch.com

Resource Scan

Scan Details

Site Domain patch.com
Base Domain patch.com
Scan Status Ok
Last Scan2024-05-17T19:28:52+00:00
Next Scan 2024-05-24T19:28:52+00:00

Last Scan

Scanned2024-05-17T19:28:52+00:00
URL https://patch.com/robots.txt
Domain IPs 151.101.130.133, 151.101.194.133, 151.101.2.133, 151.101.66.133
Response IP 151.101.194.133
Found Yes
Hash 5da3585b441b1b1344ca7effbb9ef084f39361ced1fab3f15b3d8fd2ffcd957e
SimHash 113c7b98e672

Groups

ccbot

Rule Path
Disallow /

ccbot/2.0

Rule Path
Disallow /

ccbot/2.0 (http://commoncrawl.org/faq/)

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

wikido

Rule Path
Disallow /

fr_crawler

Rule Path
Disallow /

yandex

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

baiduspider-image

Rule Path
Disallow /

baiduspider-video

Rule Path
Disallow /

baiduspider-favo

Rule Path
Disallow /

baiduspider-news

Rule Path
Disallow /

baiduspider-cpro

Rule Path
Disallow /

baiduspider-ads

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

bitvorebot

Rule Path
Disallow /

blp_bbot

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

kraken

Rule Path
Disallow /

moatbot

Rule Path
Disallow /

bhcbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

synthesio

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

brandonbot

Rule Path
Disallow /

germcrawler

Rule Path
Disallow /

sogou

Rule Path
Disallow /

exabot

Rule Path
Disallow /

maxpointcrawler

Rule Path
Disallow /

admantx

Rule Path
Disallow /

*

Rule Path
Allow /misc/*.css$
Allow /misc/*.css?
Allow /misc/*.js$
Allow /misc/*.js?
Allow /misc/*.gif
Allow /misc/*.jpg
Allow /misc/*.jpeg
Allow /misc/*.png
Allow /modules/*.css$
Allow /modules/*.css?
Allow /modules/*.js$
Allow /modules/*.js?
Allow /modules/*.gif
Allow /modules/*.jpg
Allow /modules/*.jpeg
Allow /modules/*.png
Allow /profiles/*.css$
Allow /profiles/*.css?
Allow /profiles/*.js$
Allow /profiles/*.js?
Allow /profiles/*.gif
Allow /profiles/*.jpg
Allow /profiles/*.jpeg
Allow /profiles/*.png
Allow /themes/*.css$
Allow /themes/*.css?
Allow /themes/*.js$
Allow /themes/*.js?
Allow /themes/*.gif
Allow /themes/*.jpg
Allow /themes/*.jpeg
Allow /themes/*.png
Disallow /includes/
Disallow /misc/
Disallow /modules/
Disallow /profiles/
Disallow /scripts/
Disallow /themes/
Disallow /CHANGELOG.txt
Disallow /cron.php
Disallow /INSTALL.mysql.txt
Disallow /INSTALL.pgsql.txt
Disallow /INSTALL.sqlite.txt
Disallow /install.php
Disallow /INSTALL.txt
Disallow /LICENSE.txt
Disallow /MAINTAINERS.txt
Disallow /update.php
Disallow /UPGRADE.txt
Disallow /xmlrpc.php
Disallow /admin/
Disallow /comment/reply/
Disallow /filter/tips/
Disallow /node/add/
Disallow /search/
Disallow /user/register/
Disallow /user/password/
Disallow /user/login/
Disallow /user/logout/
Disallow /?q=admin%2F
Disallow /?q=comment%2Freply%2F
Disallow /?q=filter%2Ftips%2F
Disallow /?q=node%2Fadd%2F
Disallow /?q=search%2F
Disallow /?q=user%2Fpassword%2F
Disallow /?q=user%2Fregister%2F
Disallow /?q=user%2Flogin%2F
Disallow /?q=user%2Flogout%2F
Disallow */compose$
Disallow */compose/
Disallow */classified/*/edit
Disallow */event/*/edit
Disallow */post/*/edit
Disallow */login$
Disallow */login/
Disallow */classified/*/promote
Disallow */event/*/promote
Disallow */register$
Disallow */register/
Disallow */generate_ical_event/*
Disallow *?vce=*
Disallow */ep/*
Disallow */reply/*
Disallow /am/*
Disallow /users/*?page*
Disallow /users/*/*
Disallow */page_view_timing/*
Disallow */page_view_event/*
Disallow */metrics/*
Disallow */page_action/*
Disallow */session_trace/*
Disallow */spa/*
Disallow */jserrors/*
Disallow */aggregate
Disallow /Detected
Disallow /called
Disallow /download-app
Disallow /api_v1/*
Disallow */info/*
Disallow */nodx*
Disallow *-nodx-*
Disallow *-nodx$

twitterbot

Rule Path
Allow */s/*
Allow */nodx*
Allow *-nodx-*
Allow *-nodx$

Other Records

Field Value
sitemap https://patch.com/sm/sm-h-48.xml
sitemap https://patch.com/nsm/news-sitemap-index.xml
sitemap https://patch.com/sitemap.xml

Comments

  • New crawlers to block 2016
  • CSS, JS, Images
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no clean URLs)
  • INTERNAL
  • User Profile Pages
  • Disallow newrelic stuff
  • redirect url to native app stores
  • API Endpoints