postcrossing.com
robots.txt

Robots Exclusion Standard data for postcrossing.com

Resource Scan

Scan Details

Site Domain postcrossing.com
Base Domain postcrossing.com
Scan Status Ok
Last Scan2024-06-15T01:07:20+00:00
Next Scan 2024-06-22T01:07:20+00:00

Last Scan

Scanned2024-06-15T01:07:20+00:00
URL https://postcrossing.com/robots.txt
Redirect https://www.postcrossing.com/robots.txt
Redirect Domain www.postcrossing.com
Redirect Base postcrossing.com
Domain IPs 3.67.120.29
Redirect IPs 3.67.120.29
Response IP 3.67.120.29
Found Yes
Hash 82d87f1c1192d66c684b05316b02caa27332f1bb7633ea6139c2b4f0855ffcb3
SimHash d6107159a6d7

Groups

*

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Allow /

googlebot-image

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Disallow /user/*/gallery
Disallow /gallery
Disallow /country/*
Allow /

mediapartners-google

Rule Path
Allow /

archive.org_bot

Rule Path
Disallow /user/*
Disallow /postcards/*
Disallow /gallery
Allow /

fasterfox

Rule Path
Disallow /

bhc.collectionbot

Rule Path
Disallow /

screaming frog seo spider

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

scrapybot

Rule Path
Disallow /

amazonbot

Product Comment
amazonbot Amazon's user agent
Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Allow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

fast

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

claudebot

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Disallow /user/

anthropic-ai

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Disallow /user/

anthropicbot

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Disallow /user/

claude-web

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Disallow /user/

gptbot

Rule Path
Disallow /travelingpostcard/*
Disallow /user/*/traveling
Disallow /user/*/gallery/popular
Disallow /user/*/map
Disallow /postcards/*
Disallow /user/

Comments

  • postcrossing.com robots.txt file
  • NOTE: Entries in robots.txt don't seem to inherit from '*'. Or not all bots know how to anyway, hence the repetition
  • only the right user can open it, so stop doing 403's
  • Don't need the extra load
  • only the right user can open it, so stop doing 403's
  • extra
  • AdSense crawler
  • Wayback machine: don't overdue it
  • Browser pipelining/pre-fetching is not always a good idea
  • Unidentified misbehaving bot
  • If you don't know how to behave, you are not welcome
  • Please respect our Terms of Service: spiders/scrappers are only allowed with explicit permission
  • below here is from wikipedia's robots.txt
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Misbehaving: requests much too fast:
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/