centurylink.net
robots.txt
Robots Exclusion Standard data for centurylink.net
Resource Scan
Scan Details
| Site Domain | centurylink.net |
| Base Domain | centurylink.net |
| Scan Status | Ok |
| Last Scan | 2025-12-19T07:27:12+00:00 |
| Next Scan | 2025-12-26T07:27:12+00:00 |
Last Scan
| Scanned | 2025-12-19T07:27:12+00:00 |
| URL | https://centurylink.net/robots.txt |
| Domain IPs | 129.159.71.219 |
| Response IP | 129.159.71.219 |
| Found | Yes |
| Hash | 747ba5e62e3e943b7b991cd04184402a14ab61f27d409236abad1434f7735d0b |
| SimHash | d4157b52c481 |
Groups
*
| Rule | Path |
|---|---|
| Disallow | /google/ |
| Disallow | /search/ |
| Disallow | /provisioning/ |
| Disallow | /library/ |
| Disallow | /files/ |
| Disallow | /*?*u_d= |
| Disallow | /*?*email= |
| Disallow | /*?*e-mail= |
admantx
alphabot
anthropic-ai
awariorssbot
awariosmartbot
blexbot
buzzbot
bytespider
ccbot
chatgpt-user
claritybot
claude-web
claudebot
cohere-ai
diffbot
facebookbot
friendlycrawler
google-extended
gptbot
huggingface
imagesiftbot
img2dataset
magpie-crawler
meltwater
neevabot
news-please
newsnow
nutch
omgili
omgilibot
panscient.com
perplexity-ai
perplexitybot
petalbot
piplbot
scoop.it
scrapy
seekr
sentibot
seznambot
turnitinbot
youbot
zumbot
| Rule | Path |
|---|---|
| Disallow | / |