hrdive.com
robots.txt

Robots Exclusion Standard data for hrdive.com

Resource Scan

Scan Details

Site Domain hrdive.com
Base Domain hrdive.com
Scan Status Ok
Last Scan2024-11-15T15:54:47+00:00
Next Scan 2024-11-22T15:54:47+00:00

Last Scan

Scanned2024-11-15T15:54:47+00:00
URL https://hrdive.com/robots.txt
Redirect https://www.hrdive.com/robots.txt
Redirect Domain www.hrdive.com
Redirect Base hrdive.com
Domain IPs 104.18.33.32, 172.64.154.224, 2606:4700:4400::6812:2120, 2606:4700:4400::ac40:9ae0
Redirect IPs 104.18.33.32, 172.64.154.224, 2606:4700:4400::6812:2120, 2606:4700:4400::ac40:9ae0
Response IP 104.18.33.32
Found Yes
Hash e7080d702e9e9c29e1873841cf48b02421af44799fa73009f405b3840370cd40
SimHash fb7683504752

Groups

*

Rule Path
Disallow /admin/
Disallow /newsletter/
Disallow /healthcheck/
Disallow /subpage/
Disallow /ckeditor/
Disallow /api/
Disallow /static/images/
Disallow /subscriber/
Disallow /search/*
Allow /search/$
Disallow /user_media/
Disallow /imgproxy/
Disallow /topic/?page=*

Other Records

Field Value
crawl-delay 5

twitterbot

Rule Path
Disallow

Other Records

Field Value
crawl-delay 5

googlebot-news

Rule Path
Disallow /admin/
Disallow /newsletter/
Disallow /healthcheck/
Disallow /subpage/
Disallow /ckeditor/
Disallow /api/
Disallow /static/images/
Disallow /subscriber/
Disallow /search/*
Allow /search/$
Allow /google_news_sitemap.xml
Allow /user_media/
Allow /imgproxy/
Allow /static/images/favicons

googlebot-image

Rule Path
Disallow /
Allow /imgproxy/
Allow /static/images/favicons
Allow /favicon.ico
Allow /apple-touch-icon.png
Allow /favicon-32x32.png
Allow /favicon-16x16.png
Allow /site.webmanifest
Allow /safari-pinned-tab.svg
Allow /browserconfig.xml
Allow /android-chrome-144x144.png
Allow /mstile-150x150.png

petalbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

sentibot

Rule Path
Disallow /

claritybot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookexternalhit/1.1

Rule Path
Disallow

ahrefsbot

Rule Path
Disallow /admin/
Disallow /newsletter/
Disallow /healthcheck/
Disallow /subpage/
Disallow /ckeditor/
Disallow /api/
Disallow /static/images/
Disallow /subscriber/
Disallow /search/*
Allow /search/$
Disallow /user_media/
Disallow /imgproxy/
Disallow /topic/
Disallow /editors/

Other Records

Field Value
crawl-delay 30

amazonbot

Rule Path
Disallow /admin/
Disallow /newsletter/
Disallow /healthcheck/
Disallow /subpage/
Disallow /ckeditor/
Disallow /api/
Disallow /static/images/
Disallow /subscriber/
Disallow /search/*
Allow /search/$
Disallow /user_media/
Disallow /imgproxy/
Disallow /topic/
Disallow /editors/
Disallow /signup/*
Allow /signup/$

Other Records

Field Value
crawl-delay 10

amazon-qbusiness

Product Comment
amazon-qbusiness Amazon Q Web Crawler
Rule Path
Disallow /

amazon-kendra

Product Comment
amazon-kendra Amazon Kendra Web Crawler
Rule Path
Disallow /

Comments

  • ..;coxkOOOOOOkxoc;'.
  • .:d0NWMMMMMMMMMMMMMMWN0xc'
  • .:kXMMMMMMMMMMMMMMMMMMMMMMMXl.
  • .c0WMMMMMMMMMMMMMMMMMMMMMMMXd'
  • ,OWMMMMMMMMMMMMMMMMMMMMMMMXo' ..
  • cXMMMMMMXo::::::::::::::col. .lKXl.
  • lNMMMMMMM0' .lKWMMNo
  • :XMMMMMMMM0' .l0WMMMMMNc
  • .OMMMMMMMMM0' .ccccccc;. ,KMMMMMMMMO.
  • :NMMMMMMMMM0' oWMMMMMMWKc. oWMMMMMMMN:
  • lWMMMMMMMMM0' oWMMMMMMMMX: ,KMMMMMMMMo
  • oMMMMMMMMMM0' oWMMMMMMMMNc ,KMMMMMMMMd
  • cNMMMMMMMMM0' oWMMMMMMMNd. lWMMMMMMMWl
  • '0MMMMMMMMWk. ,oooooooc' ,0MMMMMMMMK,
  • oWMMMMMMXo. ,0MMMMMMMMWo
  • .xWMMMXd' ,dXMMMMMMMMWk.
  • .xWNx' .',''''''',,;coONMMMMMMMMMWk.
  • .:, .l0WWWWWWWWWWWMMMMMMMMMMMMMNd.
  • .lKWMMMMMMMMMMMMMMMMMMMMMMMWk;
  • .lKWMMMMMMMMMMMMMMMMMMMMMMMNk;.
  • .ckXWMMMMMMMMMMMMMMMMMMWXkl'
  • .;ldO0XNWWWWWWNXKOxl;.
  • ..'',,,,''..
  • NOTE: Allow is a non-standard directive for robots.txt. It is allowed by Google bots. See https://developers.google.com/search/reference/robots_txt#allow
  • Crawl delay asks bots to wait this many seconds between requests. Ignored by google.
  • no deep queries to search
  • don't index our dynamic images
  • don't deep index topic pages
  • Rules for specific crawlers below. Note that these don't stack. If you create a specific user-agent rule
  • you should copy the rules over from '*' above by hand.
  • Allow Twitter to see all links
  • Allow Googlebot-News to see header images and favicons, BUT make it follow all the directives from our * group
  • See below link for why we have to repeat these directives
  • https://developers.google.com/search/reference/robots_txt#order-of-precedence-for-user-agents
  • no deep queries to search
  • Allow Google News to see header images and favicons
  • Googlebot-Image is now used for favicons. Allow it to see favicon-related files but nothing else
  • Don't let PetalBot crawl at all
  • Block ChatGPT bot https://platform.openai.com/docs/gptbot
  • Block sentione.com
  • block seoclarity.net/bot.html
  • block omgili.com/crawler.html
  • All Facebook crawler user-agent to see all
  • We want this bot to crawl way slower http://ahrefs.com/robot/
  • no deep queries to search
  • don't index our dynamic images
  • Restrict what Amazonbot (Alexa) can see, and a
  • no deep queries to search
  • don't index our dynamic images