twq.com
robots.txt

Robots Exclusion Standard data for twq.com

Resource Scan

Scan Details

Site Domain twq.com
Base Domain twq.com
Scan Status Failed
Failure ReasonScan timed out.
Last Scan2024-04-23T17:39:38+00:00
Next Scan 2024-06-22T17:39:38+00:00

Last Successful Scan

Scanned2021-10-15T07:36:13+00:00
URL http://twq.com/robots.txt
Redirect https://twq.elliott.gwu.edu/robots.txt
Redirect Domain twq.elliott.gwu.edu
Redirect Base gwu.edu
Found Yes
Hash 00af86d94122311dbf763088ba40dbd59f76c9d07b1340c67beb5c0333320b1f
SimHash e2c75ec088a3

Groups

*

Rule Path
Disallow /wp-admin/
Allow /wp-admin/admin-ajax.php

Other Records

Field Value
crawl-delay 30

yandex

Rule Path
Disallow /

moget

Rule Path
Disallow /

ichiro

Rule Path
Disallow /

naverbot

Rule Path
Disallow /

yeti

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

baiduspider-video

Rule Path
Disallow /

baiduspider-image

Rule Path
Disallow /

baiduspider+

Rule Path
Disallow /

baiduspider+(+http://www.baidu.com/search/spider.htm)

Rule Path
Disallow /

baiduspider/2.0;+http://www.baidu.com/search/spider.html

Rule Path
Disallow /

baiduspider/2.0

Rule Path
Disallow /

mozilla/5.0(compatible; baiduspider/2.0; +http://www.baidu.com/search/spider.html)

Rule Path
Disallow /

sogou spider

Rule Path
Disallow /

sogou web spider

Rule Path
Disallow /

youdaobot

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

sosospider/2.0

Rule Path
Disallow /

sosospider+

Rule Path
Disallow /

yisouspider

Rule Path
Disallow /

Other Records

Field Value
sitemap https://twq.elliott.gwu.edu/wp-sitemap.xml

Warnings

  • 6 invalid lines.