centurylink.net
robots.txt

Robots Exclusion Standard data for centurylink.net

Resource Scan

Scan Details

Site Domain centurylink.net
Base Domain centurylink.net
Scan Status Ok
Last Scan2025-12-19T07:27:12+00:00
Next Scan 2025-12-26T07:27:12+00:00

Last Scan

Scanned2025-12-19T07:27:12+00:00
URL https://centurylink.net/robots.txt
Domain IPs 129.159.71.219
Response IP 129.159.71.219
Found Yes
Hash 747ba5e62e3e943b7b991cd04184402a14ab61f27d409236abad1434f7735d0b
SimHash d4157b52c481

Groups

*

Rule Path
Disallow /google/
Disallow /search/
Disallow /provisioning/
Disallow /library/
Disallow /files/
Disallow /*?*u_d=
Disallow /*?*email=
Disallow /*?*e-mail=

admantx
alphabot
anthropic-ai
awariorssbot
awariosmartbot
blexbot
buzzbot
bytespider
ccbot
chatgpt-user
claritybot
claude-web
claudebot
cohere-ai
diffbot
facebookbot
friendlycrawler
google-extended
gptbot
huggingface
imagesiftbot
img2dataset
magpie-crawler
meltwater
neevabot
news-please
newsnow
nutch
omgili
omgilibot
panscient.com
perplexity-ai
perplexitybot
petalbot
piplbot
scoop.it
scrapy
seekr
sentibot
seznambot
turnitinbot
youbot
zumbot

Rule Path
Disallow /