twitter.com
robots.txt

Robots Exclusion Standard data for twitter.com

Resource Scan

Scan Details

Site Domain twitter.com
Base Domain twitter.com
Scan Status Ok
Last Scan2024-11-05T22:10:04+00:00
Next Scan 2024-11-12T22:10:04+00:00

Last Scan

Scanned2024-11-05T22:10:04+00:00
URL https://twitter.com/robots.txt
Domain IPs 104.244.42.129
Response IP 104.244.42.129
Found Yes
Hash da5b6efc5b11e34574d4359aa27114d473ead4cd527f73c8a2ebfd74b47189c2
SimHash 223efa19c4f5

Groups

googlebot

Rule Path
Allow /*?lang=
Allow /hashtag/*?src=
Allow /search?q=%23
Allow /i/api/
Disallow /search/realtime
Disallow /search/users
Disallow /search/*/grid
Disallow /*?
Disallow /*/followers
Disallow /*/following
Disallow /account/deactivated
Disallow /settings/deactivated
Disallow /%5B_0-9a-zA-Z%5D%2B/status/%5B0-9%5D%2B/likes
Disallow /%5B_0-9a-zA-Z%5D%2B/status/%5B0-9%5D%2B/retweets
Disallow /%5B_0-9a-zA-Z%5D%2B/likes
Disallow /%5B_0-9a-zA-Z%5D%2B/media
Disallow /%5B_0-9a-zA-Z%5D%2B/photo

google-extended

Rule Path
Disallow *

facebookbot

Rule Path
Disallow *

facebookexternalhit

Rule Path
Disallow *

discordbot

Rule Path
Disallow *

bingbot

Rule Path
Disallow *

*

Rule Path
Disallow /
Disallow /i/u

Other Records

Field Value
crawl-delay 1

Other Records

Field Value
sitemap https://twitter.com/sitemap.xml

Comments

  • Google Search Engine Robot
  • ==========================
  • Every bot that might possibly read and respect this file
  • ========================================================
  • WHAT-4882 - Block indexing of links in notification emails. This applies to all bots.
  • =====================================================================================
  • Wait 1 second between successive requests. See ONBOARD-2698 for details.
  • Independent of user agent. Links in the sitemap are full URLs using https:// and need to match
  • the protocol of the sitemap.

Warnings

  • `noindex` is not a known field.