foreign.mingluji.com
robots.txt

Robots Exclusion Standard data for foreign.mingluji.com

Resource Scan

Scan Details

Site Domain foreign.mingluji.com
Base Domain mingluji.com
Scan Status Ok
Last Scan2026-01-02T04:36:17+00:00
Next Scan 2026-02-01T04:36:17+00:00

Last Scan

Scanned2026-01-02T04:36:17+00:00
URL https://foreign.mingluji.com/robots.txt
Domain IPs 175.12.90.35
Response IP 175.12.90.35
Found Yes
Hash cc6d2a444059f30c4e0ffef2faf8eb330dfd488595262cfeda13a45d0efa4666
SimHash fe145d536dc3

Groups

mediapartners-google

Rule Path
Disallow

baiduspider
yisouspider
sogou web spider
sogoubot
bytespider
haosouspider
yodaobot
bingbot
googlebot
msnbot
*

Rule Path
Disallow /*Special%3AUserLogin
Disallow /%E8%AE%A8%E8%AE%BA
Disallow /thumb.php
Disallow /index.php
Disallow /skins/
Disallow /Special
Disallow /%E7%89%B9%E6%AE%8A
Disallow /*action%3D
Disallow /*oldid%3D
Disallow /*diff%3D
Disallow /*printable%3D
Disallow /1027280/

amazonbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt

Rule Path
Disallow /
Allow /index.php?title=%E7%89%B9%E6%AE%8A%3A%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9
Allow /index.php?title=%E7%89%B9%E6%AE%8A%3A%E6%9C%80%E6%96%B0%E9%A1%B5%E9%9D%A2
Allow /index.php?title=Special%3A%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9
Allow /index.php?title=Special%3A%E6%9C%80%E6%96%B0%E9%A1%B5%E9%9D%A2
Allow /index.php?title=Special%3ARecentchanges
Allow /index.php?title=Special%3ANewpages
Allow /index.php?title=Category%3A
Allow /index.php?title=%E5%88%86%E7%B1%BB%3A

Other Records

Field Value
crawl-delay 10

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Other Records

Field Value
sitemap https://foreign.mingluji.com/sitemap.xml
sitemap https://foreign.mingluji.com/rss.xml

Comments

  • jamesqi 2014-11-27 14:27
  • foreign.mingluji.com
  • Add Start
  • NOTICE: The collection of content and other data on this site through automated means, including any device, tool,or process designed to data mine or scrape content, is prohibited except (1) for the purpose of search engine indexing or artificial intelligence retrieval augmented generation or (2) with express written permission from this site’s operator.
  • To request permission to license our intellectual property and/or other materials, please contact this site’s operator directly.
  • BEGIN Cloudflare Managed content
  • Disallow: /Talk
  • 2018-12-18 comment below line because google webmaster tools can not get resouce of load.php
  • Disallow: /load.php
  • Disallow: /images/
  • Directory refused to crawl.
  • END Cloudflare Managed Content
  • sitemap start
  • sitemap end
  • Crawl-delay: 300 # set to 300 seconds to wait between successive requests to the same server for Yahoo Slurp
  • Request-rate: 1/10 # maximum rate is one page every 5 seconds
  • Visit-time: 0000-0800
  • Request-rate: 1/20s 1020-1200 # between 10:20 to 12:00, 1 visit in 20s
  • Add End
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Don't allow the wayback-maschine to index user-pages
  • User-agent: ia_archiver
  • Disallow: /wiki/User
  • Disallow: /wiki/Benutzer
  • Friendly, low-speed bots are welcome viewing article pages, but not
  • dynamically-generated pages please.
  • Inktomi's "Slurp" can read a minimum delay between hits; if your
  • bot supports such a thing using the 'Crawl-delay' or another
  • instruction, please let us know.

Warnings

  • 1 invalid line.