workinstartups.com
robots.txt

Robots Exclusion Standard data for workinstartups.com

Resource Scan

Scan Details

Site Domain workinstartups.com
Base Domain workinstartups.com
Scan Status Ok
Last Scan2026-02-16T13:52:10+00:00
Next Scan 2026-02-23T13:52:10+00:00

Last Scan

Scanned2026-02-16T13:52:10+00:00
URL https://workinstartups.com/robots.txt
Domain IPs 2a05:d018:1ac1:1500:5ffb:9e55:7a4:b868, 2a05:d018:1ac1:1501:44de:f183:522d:d688, 2a05:d018:1ac1:1502:f43:20af:e363:7985, 54.195.133.207, 54.216.122.80, 54.246.86.149
Response IP 54.195.133.207
Found Yes
Hash 0f380c80d05846c452ebf48095cc21b281679ac96b7768d4200aaa08d036a2e9
SimHash 211089e58d74

Groups

amazonbot

Rule Path
Disallow /

*

Rule Path
Allow /
Disallow /search?
Disallow /jobs/search?
Disallow /goto/ad/
Disallow /jobs/goto/ad/
Disallow /land/ad/
Disallow /jobs/land/ad/
Disallow /advanced-search?
Disallow /jobs/advanced-search?
Disallow /jobs/my-alerts?
Disallow /my-alerts?
Disallow /jobiak/
Disallow /get_avg?
Disallow /get_stats?
Disallow /_app_count*
Disallow /app_complete*
Disallow /_create*
Disallow /*?error*
Disallow /authenticate*

adsbot-google
adsbot-google-mobile

Rule Path
Disallow /create_notification
Disallow /jobs/create_notification

ccbot
gptbot
chatgpt-user
google-extended
bytespider
diffbot
facebookbot
omgili
applebot-extended
perplexitybot
amazonbot
claudebot
omgilibot
anthropic-ai
claude-web
imagesiftbot
youbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://workinstartups.com/sitemap_index.jobs_WIS.xml

Comments

  • Disallow /create_notification endpoint from being accessed by the AdsBot
  • https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers
  • Sitemap links for core sitemaps (Also, details were included, but removed in JOB-2857)
  • JOB-2438: disallow ChatGPT crawlers (see https://darkvisitors.com/agents)