integrateditgroup.com
robots.txt

Robots Exclusion Standard data for integrateditgroup.com

Resource Scan

Scan Details

Site Domain integrateditgroup.com
Base Domain integrateditgroup.com
Scan Status Ok
Last Scan2025-06-04T19:28:36+00:00
Next Scan 2025-07-04T19:28:36+00:00

Last Scan

Scanned2025-06-04T19:28:36+00:00
URL https://integrateditgroup.com/robots.txt
Domain IPs 104.26.12.122, 104.26.13.122, 172.67.72.62
Response IP 104.26.13.122
Found Yes
Hash d0d19005662a7d4f961c65089ebc32c294310f59e63d4f2a41d2024569e9d86d
SimHash 251c7e03a7d1

Groups

*

Rule Path
Allow /
Allow /index.html
Allow /about.html
Allow /careers.html
Allow /contact.html
Disallow /admin/
Disallow /private/
Disallow /cgi-bin/
Disallow /tmp/
Disallow /includes/
Disallow /backup/
Disallow /search/
Disallow /*?q=
Disallow /*.pdf$
Disallow /*.doc$
Disallow /*.docx$
Disallow /*.xls$
Disallow /*.xlsx$

googlebot

Rule Path
Allow /

bingbot

Rule Path
Allow /

slurp

Rule Path
Allow /

googlebot-image

Rule Path
Allow /*.jpg$
Allow /*.jpeg$
Allow /*.gif$
Allow /*.png$
Allow /*.webp$
Disallow /

Other Records

Field Value
sitemap https://integrateditgroup.com/sitemap.xml

Comments

  • Disallow potentially sensitive areas
  • Prevent indexing of search results, if any
  • Prevent indexing of file types that shouldn't be indexed
  • Add XML Sitemap location
  • Specific rules for major bot crawlers
  • Prevent image indexing bots from accessing non-image content