njsams.rutgers.edu
robots.txt

Robots Exclusion Standard data for njsams.rutgers.edu

Resource Scan

Scan Details

Site Domain njsams.rutgers.edu
Base Domain rutgers.edu
Scan Status Ok
Last Scan2025-07-01T15:36:19+00:00
Next Scan 2025-07-31T15:36:19+00:00

Last Scan

Scanned2025-07-01T15:36:19+00:00
URL https://njsams.rutgers.edu/robots.txt
Domain IPs 128.6.46.90
Response IP 128.6.46.90
Found Yes
Hash fade71a5052259886a2dd9ecc50eea54b7698f0059f3d39c089003d9b8347efa
SimHash 0d557a0223d8

Groups

googlebot

Rule Path
Disallow /

googlebot
adsbot-google

Rule Path
Disallow /

*

Rule Path
Disallow /

*

Rule Path
Disallow /*.xls$

*

Rule Path
Disallow /*.xlsx$

*

Rule Path
Disallow /*.pdf$

*

Rule Path
Disallow /*.doc$

*

Rule Path
Disallow /*.docx$

*

Rule Path
Disallow /*.mdb$

*

Rule Path
Disallow /*.jpg$

*

Rule Path
Disallow /*.jpeg$

*

Rule Path
Disallow /*.png$

*

Rule Path
Disallow
Disallow *.axd
Disallow /cgi-bin/
Disallow /member

bingbot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

Comments

  • Example 1: Block only Googlebot
  • Example 2: Block Googlebot and Adsbot
  • Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly)
  • For example, disallow all .xls files.