in-jpn.com
robots.txt

Robots Exclusion Standard data for in-jpn.com

Resource Scan

Scan Details

Site Domain in-jpn.com
Base Domain in-jpn.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-09-12T20:02:07+00:00
Next Scan 2024-11-11T20:02:07+00:00

Last Successful Scan

Scanned2024-07-15T17:45:17+00:00
URL https://in-jpn.com/robots.txt
Domain IPs 162.43.94.16
Response IP 162.43.94.16
Found Yes
Hash 7aa3b3463a4e44efdc53cd7d180e4b906476387a3a9176cd76a7c4dddc60be69
SimHash 41100865ac82

Groups

claudebot
go-http-client
chatgpt-user
gptbot
wpbot
colly
verity
anthropic-ai
bytespider
claude-web
cohere-ai
ccbot
ia_archiver
imagesiftbot
pinterestbot
the knowledge ai
megaindex
petalbot
scrapy
seekportbot
sogou web spider
sogou inst spider
friendly_crawler
coccocbot-web
yandex
yandexbot
yandexfavicons
mail.ru_bot
mail.ru

Rule Path
Disallow /

ahrefsbot
blexbot
cincraw
coccocbot-web
dataforseobot
facebookbot
pixalate.com
semrushbot
wellknownbot
seznambot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

timpibot
mj12bot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 20

*

Rule Path
Disallow /anime-tokusatsu/
Disallow /jishin/
Disallow /tweet/
Disallow /authorize_cmn/
Disallow /common/
Disallow /ppdphp/
Disallow /tmp/
Disallow /wp-admin/
Disallow /wp-login.php
Disallow /*.css$
Disallow /*.js$
Allow /wp-admin/admin-ajax.php

Other Records

Field Value
sitemap https://in-jpn.com/sitemap.xml
sitemap https://in-jpn.com/img-sitemap.xml
sitemap https://in-jpn.com/game-holmes/sitemap.xml
sitemap https://in-jpn.com/game-holmes/img-sitemap.xml
sitemap https://in-jpn.com/jishin/sitemap.xml
sitemap https://in-jpn.com/jishin/img-sitemap.xml

Comments

  • 行は様子見 2024.6.5時点
  • Go-http-client/1.1 はuBlock
  • User-agent: omgili
  • User-agent: omgilibot
  • User-agent: PiplBot
  • 以下、ロシア
  • User-agent: Python
  • Disallow: /
  • User-agent: Go-http-client
  • Disallow: /
  • User-agent: Amazonbot
  • Disallow: /
  • bytedance.com
  • Common Crawl
  • User-agent: https://github.com/gocolly/colly/v2
  • Disallow: /
  • -------------------------------------
  • PageSpeedで警告された 2023.10.24
  • User-agent: archive.org_bot
  • Slack はコミュツール
  • https://zbnr-hp.com/denybot-robots-txt/
  • https://jp.reuters.com/robots.txt
  • https://www.theverge.com/2023/8/21/23840705/new-york-times-openai-web-crawler-ai-gpt
  • https://github.com/mastodon/mastodon/issues/28383
  • --- robots-end ---
  • For Other Search Engine