airccse.org
robots.txt

Robots Exclusion Standard data for airccse.org

Resource Scan

Scan Details

Site Domain airccse.org
Base Domain airccse.org
Scan Status Ok
Last Scan2024-05-02T02:42:38+00:00
Next Scan 2024-06-01T02:42:38+00:00

Last Scan

Scanned2024-05-02T02:42:38+00:00
URL https://airccse.org/robots.txt
Domain IPs 69.167.168.245
Response IP 69.167.168.245
Found Yes
Hash e71863c3a790c4cd304bd73669b53b33ea1c7ba299251b0975b85de11c10406c
SimHash 01319051c6b6

Groups

googlebot

Rule Path
Allow /

mediapartners-google

Rule Path
Allow /

adsbot-google

Rule Path
Allow /

slurp

Rule Path
Allow /

openfind

Rule Path
Allow /

scooter

Rule Path
Allow /

bingbot

Rule Path
Allow /

twiceler

Rule Path
Allow /

rogerbot

Rule Path
Allow /

teoma

Rule Path
Allow /

mantraagent

Rule Path
Allow /

semanticscholarbot

Rule Path
Allow /

lycos_spider_(t-rex)

Rule Path
Allow /

robozilla

Rule Path
Allow /

zyborg

Rule Path
Allow /

ia_archiver

Rule Path
Allow /

gulliver

Rule Path
Allow /

echo2

Rule Path
Allow /

scoutjet

Rule Path
Allow /

yahoofeedseeker

Rule Path
Allow /

bloglines

Rule Path
Allow /

blogstreetbot

Rule Path
Allow /

fastbuzz.com

Rule Path
Allow /

syndic8

Rule Path
Allow /

nif/1.1

Rule Path
Allow /

newsgatoronline

Rule Path
Allow /

mywireservicebot

Rule Path
Allow /

feedster

Rule Path
Allow /

feedfetcher

Rule Path
Allow /
Disallow /sgw/
Disallow /covers/
Disallow /*checkval
Disallow /*wicket%3Ainterface

ahrefsbot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

yandexbot

Rule Path
Disallow /

*

Rule Path
Allow /

Other Records

Field Value
crawl-delay 5

Other Records

Field Value
sitemap http://airccse.org/sitemap.xml

Comments

  • all others