techguroh.com
robots.txt

Robots Exclusion Standard data for techguroh.com

Resource Scan

Scan Details

Site Domain techguroh.com
Base Domain techguroh.com
Scan Status Ok
Last Scan2024-11-14T06:47:49+00:00
Next Scan 2024-11-21T06:47:49+00:00

Last Scan

Scanned2024-11-14T06:47:49+00:00
URL https://techguroh.com/robots.txt
Domain IPs 162.241.244.76
Response IP 162.241.244.76
Found Yes
Hash 1a56a2b0face7720abc697d3867651eee1d378a5d165ed563275e68172b92f47
SimHash 5980cb60a9f2

Groups

*

Rule Path
Disallow /wp-admin/
Allow /wp-admin/admin-ajax.php

gptbot

Rule Path
Allow /

google-extended

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

applebot

Rule Path
Allow /

anthropic-ai

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

googlebot-news

Rule Path
Disallow /ad
Disallow /sponsored

*

Rule Path
Disallow /admin
Disallow /newfanshot
Disallow /users/*/replies
Disallow /users/*/comments
Disallow /account
Disallow /login
Disallow /chorus_auth
Disallow /sso
Disallow /search
Disallow /the-highlight$

*

Rule Path
Disallow /share$
Disallow /share/*
Disallow /share?*

Other Records

Field Value
sitemap /sitemaps/google_news
sitemap /sitemaps

Comments

  • Google news sitemap
  • Sitemap archive

Warnings

  • 1 invalid line.
  • `<div style='display` is not a known field.