oreilly.com
robots.txt

Robots Exclusion Standard data for oreilly.com

Resource Scan

Scan Details

Site Domain oreilly.com
Base Domain oreilly.com
Scan Status Ok
Last Scan2025-03-10T17:11:57+00:00
Next Scan 2025-03-24T17:11:57+00:00

Last Scan

Scanned2025-03-10T17:11:57+00:00
URL https://oreilly.com/robots.txt
Redirect https://www.oreilly.com/robots.txt
Redirect Domain www.oreilly.com
Redirect Base oreilly.com
Domain IPs 34.169.83.167
Redirect IPs 23.210.99.48
Response IP 23.210.99.48
Found Yes
Hash 3087102520efa5d82cba1e445f010689ae8cd9cae984828c7227c0ba58287294
SimHash 2aae0b01e684

Groups

*

Rule Path
Disallow /images/
Disallow /graphics/
Disallow /admin/
Disallow /promos/
Disallow /ddp/
Disallow /dpp/
Disallow /programming/free/files/
Disallow /design/free/files/
Disallow /iot/free/files/
Disallow /data/free/files/
Disallow /webops-perf/free/files/
Disallow /web-platform/free/files/
Disallow /cs/
Disallow /test/
Disallow /*/?ar
Disallow /*/?orpq
Disallow /*/?discount=learn
Disallow /self-registration/*

ai2bot
ai2bot-dolma
amazonbot
applebot
applebot-extended
bytespider
ccbot
cohere-ai
cohere-training-data-crawler
diffbot
duckassistbot
facebookbot
friendlycrawler
iaskspider/2.0
icc-crawler
imagesiftbot
img2dataset
isscyberriskcrawler
kangaroo bot
meta-externalagent
meta-externalfetcher
oai-searchbot
omgili
omgilibot
pangubot
perplexitybot
petalbot
scrapy
semrushbot
sidetrade indexer bot
timpibot
velenpublicwebcrawler
webzio-extended
youbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.oreilly.com/book-sitemap.xml
sitemap https://www.oreilly.com/radar/sitemap.xml
sitemap https://www.oreilly.com/content/sitemap.xml
sitemap https://www.oreilly.com/video-sitemap.xml

Comments

  • ITOPS-36974
  • ITOPS-37801
  • ITOPS-10158
  • ITOPS-8392
  • ITOPS-10157

Warnings

  • 2 invalid lines.