thepost.co.nz
robots.txt

Robots Exclusion Standard data for thepost.co.nz

Resource Scan

Scan Details

Site Domain thepost.co.nz
Base Domain thepost.co.nz
Scan Status Ok
Last Scan2024-10-23T14:49:24+00:00
Next Scan 2024-11-22T14:49:24+00:00

Last Scan

Scanned2024-10-23T14:49:24+00:00
URL https://thepost.co.nz/robots.txt
Redirect https://www.thepost.co.nz/robots.txt
Redirect Domain www.thepost.co.nz
Redirect Base thepost.co.nz
Domain IPs 108.157.254.15, 108.157.254.78, 108.157.254.81, 108.157.254.91
Redirect IPs 151.101.130.227, 151.101.194.227, 151.101.2.227, 151.101.66.227, 2a04:4e42:200::739, 2a04:4e42:400::739, 2a04:4e42:600::739, 2a04:4e42::739
Response IP 199.232.46.227
Found Yes
Hash 5b215310098ecb65832fd6c85aafa63455ec1130ad414125ff7d42f14252d623
SimHash 305e11d38df1

Groups

grapeshot

Rule Path
Disallow

*

Rule Path
Disallow /essentialmums/
Disallow /email_a_friend/
Disallow /entertainment/bravo

amazonbot

Rule Path
Disallow /

applebot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

meta-externalfetcher

Rule Path
Disallow /

oai-searchbot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

timpibot

Rule Path
Disallow /

webzio-extended

Rule Path
Disallow /

youbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://thepost.co.nz/sitemap.xml

Comments

  • robots for https://thepost.co.nz
  • allowing grapeshot to access to content
  • Disallowed paths
  • Site Scrapers and bots that are not desirable: