jacket2.org
robots.txt

Robots Exclusion Standard data for jacket2.org

Resource Scan

Scan Details

Site Domain jacket2.org
Base Domain jacket2.org
Scan Status Ok
Last Scan2025-08-21T15:47:34+00:00
Next Scan 2025-09-20T15:47:34+00:00

Last Scan

Scanned2025-08-21T15:47:34+00:00
URL https://jacket2.org/robots.txt
Domain IPs 23.185.0.4, 2620:12a:8000::4, 2620:12a:8001::4
Response IP 23.185.0.4
Found Yes
Hash 4c7a3cd22ca885624855188b2637c174fffe5779eb9faf9b0563f718c37126a3
SimHash 3d9439134768

Groups

*

Rule Path
Allow /core/*.css$
Allow /core/*.css?
Allow /core/*.js$
Allow /core/*.js?
Allow /core/*.gif
Allow /core/*.jpg
Allow /core/*.jpeg
Allow /core/*.png
Allow /core/*.svg
Allow /profiles/*.css$
Allow /profiles/*.css?
Allow /profiles/*.js$
Allow /profiles/*.js?
Allow /profiles/*.gif
Allow /profiles/*.jpg
Allow /profiles/*.jpeg
Allow /profiles/*.png
Allow /profiles/*.svg
Disallow /core/
Disallow /profiles/
Disallow /README.md
Disallow /composer/Metapackage/README.txt
Disallow /composer/Plugin/ProjectMessage/README.md
Disallow /composer/Plugin/Scaffold/README.md
Disallow /composer/Plugin/VendorHardening/README.txt
Disallow /composer/Template/README.txt
Disallow /modules/README.txt
Disallow /sites/README.txt
Disallow /themes/README.txt
Disallow /web.config
Disallow /admin/
Disallow /comment/reply/
Disallow /filter/tips
Disallow /node/add/
Disallow /search/
Disallow /user/register
Disallow /user/password
Disallow /user/login
Disallow /user/logout
Disallow /media/oembed
Disallow /*/media/oembed
Disallow /index.php/admin/
Disallow /index.php/comment/reply/
Disallow /index.php/filter/tips
Disallow /index.php/node/add/
Disallow /index.php/search/
Disallow /index.php/user/password
Disallow /index.php/user/register
Disallow /index.php/user/login
Disallow /index.php/user/logout
Disallow /index.php/media/oembed
Disallow /index.php/*/media/oembed
Disallow /search
Disallow /system/
Disallow /administrator/
Disallow /wp-content/
Disallow /wp-admin/
Disallow /cgi-bin/
Disallow /core/
Disallow /wp-includes/
Disallow /wp/
Disallow /pantheon_healthcheck
Disallow /pantheon_healthcheck/
Disallow /node/add/
Disallow /events/past-events
Disallow /sites/www.math.upenn.edu/themes/bootstrap/
Disallow /?q=node%2Fadd
Disallow /calendar/day/2023*
Disallow /calendar/day/2024*
Disallow /calendar/day/2022*
Disallow /sites/default/files/*.pdf
Disallow /application/core/
Disallow /*.pdf$
Disallow /*.xml$
Disallow /*.php
Disallow /node?*
Disallow /node/?*
Disallow /ALF_DATA/

brightbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 900000

petalbot
semrushbot
pingdombot
mauibot
dotbot
ahrefsbot
aspiegelbot
mj12bot

Rule Path
Disallow /

openai-gpt
claudebot
gptbot
chatgpt-user
claude-web
semrushbot
brightbot
pingdombot
petalbot
barkrowler
go-http-client/1.1
pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
yandexbot
brightbot 1.0
ping*
bright*
chat*
pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
apache-httpclient/4.5.2 (java/1.8.0_161)
claude-user
claude-searchbot
ccbot
diffbot
perplexitybot
perplexity‑user
omgili
omgilibot
webzio-extended
imagesiftbot
bytespider
tiktokspider
youbot
semrushbot-ocob
petalbot
velenpublicwebcrawler
turnitinbot
timpibot
oai-searchbot
icc-crawler
ai2bot
ai2bot-dolma
dataforseobot
awariobot
awariosmartbot
awariorssbot
pangubot
kangaroo bot
sentibot
img2dataset
meltwater
seekr
peer39_crawler
cohere-ai
cohere-training-data-crawler
duckassistbot
scrapy
cotoyogi
aihitbot
factset_spyderbot
firecrawlagent
velenpublicwebcrawler

Rule Path
Disallow /

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/robotstxt.html
  • CSS, JS, Images
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no clean URLs)
  • Paths - Disallow over-crawling search pages
  • disallow file overcrawling
  • crawl-delay if they ignore the blocks
  • Crawl-delay: 900
  • User-Agent: *
  • DisallowAITraining: /
  • User-Agent: *
  • DisallowAITraining: /
  • Content-Usage: ai=n
  • Allow: /

Warnings

  • 1 invalid line.