int-ent.de
robots.txt

Robots Exclusion Standard data for int-ent.de

Resource Scan

Scan Details

Site Domain int-ent.de
Base Domain int-ent.de
Scan Status Ok
Last Scan2025-10-18T21:43:58+00:00
Next Scan 2025-10-25T21:43:58+00:00

Last Scan

Scanned2025-10-18T21:43:58+00:00
URL https://int-ent.de/robots.txt
Domain IPs 78.46.146.190
Response IP 78.46.146.190
Found Yes
Hash 03bfd411563d01afbba01445289607eb81b88500baa3024187c850c1e9a3a39d
SimHash b21a5349c6f6

Groups

*

Rule Path
Allow /wp-admin/admin-ajax.php
Disallow /wp-admin
Disallow /cgi-bin
Disallow /wp-includes
Disallow /wp-content/plugins
Disallow /wp-content/cache
Disallow /wp-content/themes
Disallow /trackback
Disallow /feed
Disallow /comments
Disallow /category/*/*
Disallow */trackback/
Disallow */feed/
Disallow */comments/
Allow /wp-content/cache/*.css
Allow /wp-content/cache/*.js
Allow /wp-content/plugins/simple-share-buttons-adder/css/*.css
Allow /wp-content/plugins/simple-share-buttons-adder/buttons/simple/*.png
Allow /wp-includes/js/jquery/*.js
Allow /wp-content/uploads/wordpress-popular-posts/*.jpg

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

fast

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Other Records

Field Value
sitemap https://int-ent.de/sitemap-news.xml
sitemap https://int-ent.de/sitemap.xml
sitemap https://int-ent.de/news-sitemap.xml

Comments

  • XML Sitemap & Google News Feeds version 4.6.3 - http://status301.net/wordpress-plugins/xml-sitemap-feed/
  • Allow styles and js for rendering
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Misbehaving: requests much too fast:
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/