integrate.io
robots.txt

Robots Exclusion Standard data for integrate.io

Resource Scan

Scan Details

Site Domain integrate.io
Base Domain integrate.io
Scan Status Ok
Last Scan2024-11-07T03:46:06+00:00
Next Scan 2024-11-21T03:46:06+00:00

Last Scan

Scanned2024-11-07T03:46:06+00:00
URL https://integrate.io/robots.txt
Redirect https://www.integrate.io/robots.txt
Redirect Domain www.integrate.io
Redirect Base integrate.io
Domain IPs 13.227.254.104, 13.227.254.107, 13.227.254.2, 13.227.254.59
Redirect IPs 13.225.4.102, 13.225.4.36, 13.225.4.49, 13.225.4.78
Response IP 13.225.4.49
Found Yes
Hash f7d75a87b24a069331edfa71d2dddf87943be78d27ab0ed3034160862a70fff1
SimHash ba956d0524c0

Groups

*

Rule Path
Disallow /signup/welcome
Disallow /signup/thanks
Disallow /contact/thanks
Disallow /login
Disallow /*q%3D
Disallow /*.atom
Disallow /blog/tag/

Other Records

Field Value
sitemap https://www.integrate.io/sitemap.xml
sitemap https://www.integrate.io/blog-sitemap.xml
sitemap https://www.integrate.io/glossary-sitemap.xml
sitemap https://www.integrate.io/xplenty-docs-sitemap.xml
sitemap https://www.integrate.io/flydata-docs-sitemap.xml
sitemap https://www.integrate.io/webinars-sitemap.xml
sitemap https://www.integrate.io/webinars-japanese-sitemap.xml
sitemap https://www.integrate.io/customers-sitemap.xml
sitemap https://www.integrate.io/books-and-guides-sitemap.xml
sitemap https://www.integrate.io/careers-sitemap.xml

Comments

  • See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
  • https://developers.google.com/webmasters/control-crawl-index/
  • To ban all spiders from the entire site uncomment the next two lines: