thenextweb.com
robots.txt

Robots Exclusion Standard data for thenextweb.com

Resource Scan

Scan Details

Site Domain thenextweb.com
Base Domain thenextweb.com
Scan Status Ok
Last Scan2025-07-05T14:39:57+00:00
Next Scan 2025-07-12T14:39:57+00:00

Last Scan

Scanned2025-07-05T14:39:57+00:00
URL https://thenextweb.com/robots.txt
Domain IPs 151.101.130.46, 151.101.194.46, 151.101.2.46, 151.101.66.46, 2a04:4e42:200::558, 2a04:4e42:400::558, 2a04:4e42:600::558, 2a04:4e42::558
Response IP 151.101.66.46
Found Yes
Hash 78739701e56ead562375449c50b71a916d056f09b65cadad89a304ab1de6dce0
SimHash 80a91d506491

Groups

*

Rule Path
Disallow /admin/*
Disallow /cpresources/
Disallow /vendor/
Disallow /.env
Disallow /docs/3.x/fr/
Disallow /index.php/actions/
Disallow /cdn-cgi/

googlebot-news

Rule Path
Disallow /shareables/

Other Records

Field Value
sitemap https://thenextweb.com/sitemap-googlenews.xml
sitemap https://thenextweb.com/sitemap-tnwConf.xml
sitemap https://thenextweb.com/sitemap-x.xml
sitemap https://thenextweb.com/sitemap-articles-index.xml
sitemap https://thenextweb.com/sitemap-spaces.xml