deja.com
robots.txt

Robots Exclusion Standard data for deja.com

Resource Scan

Scan Details

Site Domain deja.com
Base Domain deja.com
Scan Status Ok
Last Scan2024-04-28T08:03:05+00:00
Next Scan 2024-05-28T08:03:05+00:00

Last Scan

Scanned2024-04-28T08:03:05+00:00
URL http://www.deja.com/robots.txt
Redirect https://groups.google.com/robots.txt
Redirect Domain groups.google.com
Redirect Base google.com
Domain IPs 2404:6800:4003:c1c::64, 2404:6800:4003:c1c::66, 2404:6800:4003:c1c::71, 2404:6800:4003:c1c::8b, 74.125.200.100, 74.125.200.101, 74.125.200.102, 74.125.200.113, 74.125.200.138, 74.125.200.139
Redirect IPs 2001:4860:4802:32::177, 2001:4860:4802:34::177, 2001:4860:4802:36::177, 2001:4860:4802:38::177, 216.239.32.177, 216.239.34.177, 216.239.36.177, 216.239.38.177
Response IP 142.251.175.102
Found Yes
Hash 5f5d0a93f52058dfb06cfbcbb14a061d4d075db492963fafe6e43b81224ebcd1
SimHash 1ffefc676056

Groups

*

Rule Path
Disallow /groups/search
Disallow /groups/dir?*q=
Disallow /a/*.*/groups/search
Disallow /a/*.*/groups/dir?*q=
Disallow /d/search*
Disallow /d/topicsearch*
Disallow /a/*.*/d/search*
Disallow /a/*.*/d/topicsearch*
Disallow /*_escaped_fragment_%3Daboutgroup
Disallow /*_escaped_fragment_%3Dforumsearch
Disallow /*_escaped_fragment_%3Dmyforums
Disallow /*_escaped_fragment_%3Dnewtopic
Disallow /*_escaped_fragment_%3Dsearch
Disallow /*_escaped_fragment_%3Dsearchin
Disallow /*_escaped_fragment_%3Dstarred
Allow /$
Allow /a/
Allow /a/*.*/about
Allow /a/*.*/browse_
Allow /a/*.*/g/
Allow /a/*.*/group
Allow /a/*.*/groups
Allow /a/*.*/images
Allow /a/*.*/index
Allow /a/*.*/messages
Allow /a/*.*/msg/
Allow /a/*.*/threads
Allow /a/*.*/topics
Allow /a/*.*/tree
Allow /about
Allow /browse_
Allow /finance
Allow /g/
Allow /group
Allow /groups
Allow /images
Allow /index
Allow /messages
Allow /msg/
Allow /support
Allow /threads
Allow /topics
Allow /tree
Allow /googlegroups/
Allow /a/*.*/d/
Allow /a/*.*/forum$
Allow /a/*.*/forum/
Allow /d/
Allow /forum$
Allow /forum/
Allow /?hl=
Disallow /?hl=*&
Disallow /

Comments

  • robots.txt for Google Groups. See this URL for documentation on robots.txt:
  • https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
  • Note in particular that "the most specific rule based on the length of the
  • [path] entry will trump the less specific (shorter) rule."
  • Explicitly disallow indexing of pages that do not have valuable
  • crawlable views (see b/21331185).