webagre.com
robots.txt

Robots Exclusion Standard data for webagre.com

Resource Scan

Scan Details

Site Domain webagre.com
Base Domain webagre.com
Scan Status Ok
Last Scan2024-11-06T17:44:06+00:00
Next Scan 2024-11-20T17:44:06+00:00

Last Scan

Scanned2024-11-06T17:44:06+00:00
URL https://webagre.com/robots.txt
Domain IPs 219.111.240.121
Response IP 219.111.240.121
Found Yes
Hash 25e9f9a93662c77ca9fc28064295c4aa1a9eb990c61698880c4cb980e9a3089c
SimHash c11cd978e793

Groups

dotbot
ahrefsbot
mappy
semrushbot
baiduspider
baiduimagespider
baiduspider

Rule Path
Disallow /

bingbot
adidxbot

Rule Path
Allow /
Disallow /job/login
Disallow /career/login
Disallow /job/apply/
Disallow /career/apply/
Disallow /job/preview/
Disallow /career/preview/

Other Records

Field Value
crawl-delay 10

*

Rule Path
Allow /
Disallow /job/login
Disallow /career/login
Disallow /job/preview/
Disallow /job/apply/
Disallow /career/apply/
Disallow /career/preview/

Other Records

Field Value
crawl-delay 5

Other Records

Field Value
sitemap https://webagre.com/sitemap.xml
sitemap https://webagre.com/storage/sitemap_job_list.xml