inetspec.com
robots.txt

Robots Exclusion Standard data for inetspec.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	inetspec.com
Base Domain	inetspec.com
Scan Status	Failed
Failure Reason	Scan timed out.
Last Scan	2024-09-27T06:14:04+00:00
Next Scan	2024-10-27T06:14:04+00:00

Last Successful Scan

Scanned	2024-08-06T03:06:37+00:00
URL	https://inetspec.com/robots.txt
Domain IPs	66.228.138.150
Response IP	66.228.138.150
Found	Yes
Hash	8cfd95f889ccbb6c3a66115c018020b305805f94a7e28cb85e9653dc38be8a2f
SimHash	241679c747f4

Groups

*

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

copyright sheriff

Rule	Path
Disallow	/

Rule

Path

Disallow

dealgates bot

Rule	Path
Disallow	/

Rule

Path

Disallow

gaisbot

Rule	Path
Disallow	/

Rule

Path

Disallow

gingercrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

renlifangbot

Rule	Path
Disallow	/

Rule

Path

Disallow

surveybot

Rule	Path
Disallow	/

Rule

Path

Disallow

voilabot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

yeti

Rule	Path
Disallow	/

Rule

Path

Disallow

youdaobot

Rule	Path
Disallow	/

Rule

Path

Disallow

butterfly
charlotte
exabot
envolk
gigabot
scoutjet
speedy
teoma
turnitinbot
twiceler
yowedobot
mj12bot

Rule	Path
Disallow
Disallow	/adm
Disallow	/php
Disallow	/tmp
Disallow	/upl
Disallow	/index.php

Rule

Path

Disallow

/adm

Disallow

/php

Disallow

/tmp

Disallow

/upl

Disallow

/index.php

Other Records

Field	Value
crawl-delay	20

Field

Value

crawl-delay

ia_archiver

Rule	Path
Disallow
Disallow	/adm
Disallow	/php
Disallow	/tmp
Disallow	/upl
Disallow	/index.php

Rule

Path

Disallow

/adm

Disallow

/php

Disallow

/tmp

Disallow

/upl

Disallow

/index.php

adsbot-google
googlebot

Rule	Path
Disallow	/files
Disallow	/adm
Disallow	/php
Disallow	/tmp
Disallow	/uploads
Disallow	/index.php
Allow	/

Rule

Path

Disallow

/files

Disallow

/adm

Disallow

/php

Disallow

/tmp

Disallow

/uploads

Disallow

/index.php

Allow

googlebot-image
mediapartners-google

Rule	Path
Disallow	/

Rule

Path

Disallow

slurp

Rule	Path
Disallow	/

Rule

Path

Disallow

bingbot
msnbot
msnbot
msnbot-products
msnbot-newsblogs

Rule	Path
Disallow
Disallow	/adm
Disallow	/php
Disallow	/tmp
Disallow	/uploads
Disallow	/index.php

Rule

Path

Disallow

/adm

Disallow

/php

Disallow

/tmp

Disallow

/uploads

Disallow

/index.php

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

bingbot-media
msnbot-media

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	http://www.inetspec.com/sitemap
sitemap	http://www.inetspec.com/sitemap.xml

Field

Value

sitemap

http://www.inetspec.com/sitemap

sitemap

http://www.inetspec.com/sitemap.xml

Comments

iNet Specialists
Robots TXT File
Good reference pages:
http://antezeta.com/news/avoid-search-engine-indexing
http://www.youtube.com/user/GoogleWebmasterHelp
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182072
http://www.bing.com/community/site_blogs/b/webmaster/archive/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbot-s-question.aspx
Base directive for unknown and mis-behaving spiders
More specific general directives
added to prevent indexing of certain pages
Alexa web archiver directives
added to prevent indexing of certain pages
Googlebot(s) specific directives
Files that SHOULD NOT be crawled (and have been)
Disallow: /pathname.ext
Directories that SHOULD NOT be crawled (and have been)
added Dec 2012 to prevent indexing of certain pages
Allow everything else (according to Google robots.txt translation)
NO Image/AdSense Bots
Yahoo! Slurp(s) specific directives
The Slurp! is DEAD, Long Live the Bing
MSNbot(s) specific directives for BING
Files that SHOULD NOT be crawled (and have been)
Disallow: /pathname.ext
Directories that SHOULD NOT be crawled (and have been)
added Dec 2012 to prevent indexing of certain pages
NO Media
html version
xml version

inetspec.comrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

ahrefsbot

baiduspider

copyright sheriff

dealgates bot

gaisbot

gingercrawler

nutch

renlifangbot

surveybot

voilabot

yandex

yeti

youdaobot

butterflycharlotteexabotenvolkgigabotscoutjetspeedyteomaturnitinbottwiceleryowedobotmj12bot

Other Records

ia_archiver

adsbot-googlegooglebot

googlebot-imagemediapartners-google

slurp

bingbotmsnbotmsnbotmsnbot-productsmsnbot-newsblogs

Other Records

bingbot-mediamsnbot-media

Other Records

Comments

inetspec.com
robots.txt

butterfly
charlotte
exabot
envolk
gigabot
scoutjet
speedy
teoma
turnitinbot
twiceler
yowedobot
mj12bot

adsbot-google
googlebot

googlebot-image
mediapartners-google

bingbot
msnbot
msnbot
msnbot-products
msnbot-newsblogs

bingbot-media
msnbot-media