clgw.net
robots.txt

Robots Exclusion Standard data for clgw.net

Resource Scan

Scan Details

Site Domain clgw.net
Base Domain clgw.net
Scan Status Ok
Last Scan2024-09-23T15:56:48+00:00
Next Scan 2024-10-23T15:56:48+00:00

Last Scan

Scanned2024-09-23T15:56:48+00:00
URL https://clgw.net/robots.txt
Domain IPs 64.192.69.145
Response IP 64.192.69.145
Found Yes
Hash 8f9882e6b5613e3f86a960483075488213775c86ac72c763759e2a89c76e61ee
SimHash 2c219b905676

Groups

*

Rule Path
Disallow /contact.php
Disallow /cgi-bin
Disallow /wp-admin
Disallow /wp-includes
Disallow /wp-content
Disallow /wp-login.php

*

Rule Path
Disallow /disallowed_page.php

*

Rule Path
Disallow /address
Disallow /blackhole

adsbot-google
adsbot-google-mobile
adsbot-google-mobile-apps
adidxbot
applebot
applenewsbot
bingbot
bingpreview
bublupbot
ccbot
duckduckbot
duckduckgo-favicons-bot
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
mediapartners-google
mojeekbot
msnbot
msnbot-media
orangebot
pinterest
twitterbot

Rule Path
Allow /

*

Rule Path
Disallow /

Comments

  • google.com landing page quality checks
  • google.com app resource fetcher
  • bing ads bot
  • apple.com search engine
  • bing.com international search engine
  • bublup.com suggestion/search engine
  • commoncrawl.org open repository of web crawl data
  • duckduckgo.com international privacy search engine
  • google.com international search engine
  • google.com adsense bot
  • mojeek.com search engine
  • bing.com international search engine
  • orange.com international search engine
  • pinterest.com social networtk
  • twitter.com bot
  • crawling rule(s) for above bots
  • disallow all other bots