weather.com
robots.txt

Robots Exclusion Standard data for weather.com

Resource Scan

Scan Details

Site Domain weather.com
Base Domain weather.com
Scan Status Ok
Last Scan2024-10-28T20:07:01+00:00
Next Scan 2024-11-04T20:07:01+00:00

Last Scan

Scanned2024-10-28T20:07:01+00:00
URL https://weather.com/robots.txt
Domain IPs 173.222.146.176, 2600:1413:b000:887::2e03, 2600:1413:b000:892::2e03
Response IP 173.222.146.176
Found Yes
Hash 4cdc38c9293c9ecf0e61f4bbe1c6b6eaf95d4d129eada75730b3d6463466d96e
SimHash 55c85840e6d8

Groups

*
criteobot/0.1

Rule Path
Disallow
Disallow /includes/
Disallow /life/
Disallow /misc/
Disallow /modules/
Disallow /profiles/
Disallow /scripts/
Disallow /themes/
Disallow /appspromo
Disallow /CHANGELOG.txt
Disallow /cron.php
Disallow /INSTALL.mysql.txt
Disallow /INSTALL.pgsql.txt
Disallow /INSTALL.sqlite.txt
Disallow /install.php
Disallow /INSTALL.txt
Disallow /LICENSE.txt
Disallow /MAINTAINERS.txt
Disallow /update.php
Disallow /UPGRADE.txt
Disallow /xmlrpc.php
Disallow /migration/
Disallow /admin/
Disallow /comment/reply/
Disallow /filter/tips/
Disallow /node/add/
Disallow /search/
Disallow /user/register/
Disallow /user/password/
Disallow /user/login/
Disallow /user/logout/
Disallow /g00/
Disallow /g01/
Disallow /g02/
Disallow /g03/
Disallow /g04/
Disallow /g05/
Disallow /g06/
Disallow /g07/
Disallow /g08/
Disallow /g09/
Disallow /g10/
Disallow /g11/
Disallow /g12/
Disallow /g13/
Disallow /g14/
Disallow /g15/
Disallow /g16/
Disallow /g17/
Disallow /g18/
Disallow /g19/
Disallow /g20/
Disallow /g21/
Disallow /g22/
Disallow /g23/
Disallow /g24/
Disallow /g25/
Disallow /g26/
Disallow /g27/
Disallow /g28/
Disallow /g29/
Disallow /g30/
Disallow /g31/
Disallow /g32/
Disallow /g33/
Disallow /g34/
Disallow /g35/
Disallow /g36/
Disallow /g37/
Disallow /g38/
Disallow /g39/
Disallow /g40/
Disallow /g41/
Disallow /g42/
Disallow /g43/
Disallow /g44/
Disallow /g45/
Disallow /g46/
Disallow /g47/
Disallow /g48/
Disallow /g49/
Disallow /g50/
Disallow /g51/
Disallow /g52/
Disallow /g53/
Disallow /g54/
Disallow /g55/
Disallow /g56/
Disallow /g57/
Disallow /g58/
Disallow /g59/
Disallow /g60/
Disallow /g61/
Disallow /g62/
Disallow /g63/
Disallow /g64/
Disallow /g65/
Disallow /g66/
Disallow /g67/
Disallow /g68/
Disallow /g69/
Disallow /g70/
Disallow /g71/
Disallow /g72/
Disallow /g73/
Disallow /g74/
Disallow /g75/
Disallow /g76/
Disallow /g77/
Disallow /g78/
Disallow /g79/
Disallow /g80/
Disallow /g81/
Disallow /g82/
Disallow /g83/
Disallow /g84/
Disallow /g85/
Disallow /g86/
Disallow /g87/
Disallow /g88/
Disallow /g89/
Disallow /g90/
Disallow /g91/
Disallow /g92/
Disallow /g93/
Disallow /g94/
Disallow /g95/
Disallow /g96/
Disallow /g97/
Disallow /g98/
Disallow /g99/
Disallow /?q=admin%2F
Disallow /?q=comment%2Freply%2F
Disallow /?q=filter%2Ftips%2F
Disallow /?q=node%2Fadd%2F
Disallow /?q=search%2F
Disallow /?q=user%2Fpassword%2F
Disallow /?q=user%2Fregister%2F
Disallow /?q=user%2Flogin%2F
Disallow /?q=user%2Flogout%2F
Disallow /sponsored
Disallow /ugc
Disallow /sponsored-content

Other Records

Field Value
crawl-delay 0.02

amazonbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

awariorssbot
awariosmartbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

friendlycrawler

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

meta-externalagent
meta-externalagent

Rule Path
Disallow /

newsnow

Rule Path
Disallow /

news-please

Rule Path
Disallow /

oai-searchbot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

peer39_crawler
peer39_crawler/1.0

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

quora-bot

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://weather.com/en-US/sitemaps/sitemap.xml
sitemap https://weather.com/pt-PT/sitemaps/sitemap.xml
sitemap https://weather.com/de-DE/sitemaps/sitemap.xml
sitemap https://weather.com/fr-FR/sitemaps/sitemap.xml
sitemap https://weather.com/es-US/sitemaps/sitemap.xml
sitemap https://weather.com/es-ES/sitemaps/sitemap.xml
sitemap https://weather.com/en-IN/sitemaps/sitemap.xml
sitemap https://weather.com/en-GB/sitemaps/sitemap.xml
sitemap https://weather.com/en-CA/sitemaps/sitemap.xml

Comments

  • /robots.txt
  • Last updated by arjun.lather 08/29/2023
  • Disallowed for PhantomJS
  • Crawl-delay: 10
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no clean URLs)
  • Block Bots
  • Sitemaps