ricoh-usa.com
robots.txt

Robots Exclusion Standard data for ricoh-usa.com

Resource Scan

Scan Details

Site Domain ricoh-usa.com
Base Domain ricoh-usa.com
Scan Status Ok
Last Scan2024-09-26T15:28:10+00:00
Next Scan 2024-10-26T15:28:10+00:00

Last Scan

Scanned2024-09-26T15:28:10+00:00
URL https://ricoh-usa.com/robots.txt
Redirect https://www.ricoh-usa.com/robots.txt
Redirect Domain www.ricoh-usa.com
Redirect Base ricoh-usa.com
Domain IPs 76.76.21.21
Redirect IPs 76.76.21.21
Response IP 76.76.21.21
Found Yes
Hash 3395a703187f66ae4ebdb2db126e3158564e78201d09edd7dc53eeff0c7b8941
SimHash 6f0614cfcf97

Groups

*

Rule Path
Allow /
Disallow */App_Config*
Disallow /itchannel/
Disallow /en/about-us/terms-of-use
Disallow /en/about-us/privacy-policy
Disallow */downloads/*
Disallow */en/search/*
Disallow /en/About-Us/Safe-Harbor-Privacy-Statement
Disallow */about/awards/*
Disallow /about/docs/pdf/NECS/
Disallow /technology/
Disallow /cloud-hosting-managed-it/
Disallow /en/products/supplies/search/
Disallow */test/*
Disallow */Test/*

googlebot-image

Rule Path
Allow /_next/image?*

gsa-crawler

Rule Path
Allow /

ninjabot

Rule Path
Allow /

Other Records

Field Value
sitemap https://www.ricoh-usa.com/sitemap.xml

Comments

  • Ricoh Americas corporation
  • Disallow wellbehaved webcrawlers from indexing
  • Note to auditors: If your webscanning tool reports this robots.txt
  • file as a potential vulnerability, and suggests removing it, please
  • ignore it, and log a bug against the webscanning tool.