doglegs.com
robots.txt

Robots Exclusion Standard data for doglegs.com

Resource Scan

Scan Details

Site Domain doglegs.com
Base Domain doglegs.com
Scan Status Ok
Last Scan2025-03-10T20:40:29+00:00
Next Scan 2025-04-09T20:40:29+00:00

Last Scan

Scanned2025-03-10T20:40:29+00:00
URL https://doglegs.com/robots.txt
Domain IPs 70.32.23.31
Response IP 70.32.23.31
Found Yes
Hash fbab0303f26507165e8aae00b1b81ee2b9154b1245a10ef1b50767edcdd395c3
SimHash e20e145a4765

Groups

googlebot

Rule Path
Allow *.js
Allow *.css
Allow /ads/preferences/
Allow /dtt/k
Allow /pagead/show_ads.js
Allow /pagead/js/adsbygoogle.js
Allow /pagead/js/*/show_ads_impl.js
Allow /static/glade.js
Allow /static/glade/
Allow /tag/js/

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

*

Rule Path
Disallow /administrator/
Disallow /api/
Disallow /bin/
Disallow /cache/
Disallow /cli/
Disallow /components/
Disallow /includes/
Disallow /installation/
Disallow /language/
Disallow /layouts/
Disallow /libraries/
Disallow /logs/
Disallow /modules/
Disallow /plugins/
Disallow /tmp/
Disallow /billing/
Disallow /entertainment-sports-health-a-news.html
Disallow /check-email.html
Disallow /faq-s.html
Disallow /adjust-spam-settings.html
Disallow /terms-conditions.html
Disallow /privacy-policy.html
Disallow /contact-support/remote-support-assistance.html
Disallow /contact-support/webmail-interface.html

Comments

  • If the Joomla site is installed within a folder such as at
  • e.g. www.example.com/joomla/ the robots.txt file MUST be
  • moved to the site root at e.g. www.example.com/robots.txt
  • AND the joomla folder name MUST be prefixed to the disallowed
  • path, e.g. the Disallow rule for the /administrator/ folder
  • MUST be changed to read Disallow: /joomla/administrator/
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/orig.html
  • For syntax checking, see:
  • http://tool.motoricerca.info/robots-checker.phtml
  • Googlebot
  • Common Crawl used for Bot Training
  • Blocks GTPuser instructions to reference your website
  • ChatGTPBot
  • Google Bard's ChatBot for Training
  • Blocks Facebooks Speech Training Bots
  • Rumored to be Bytedance or TicTok
  • Also Added Pages to Block