opencorporates.com
robots.txt

Robots Exclusion Standard data for opencorporates.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	opencorporates.com
Base Domain	opencorporates.com
Scan Status	Ok
Last Scan	2024-10-04T22:11:30+00:00
Next Scan	2024-10-11T22:11:30+00:00

Last Scan

Scanned	2024-10-04T22:11:30+00:00
URL	https://opencorporates.com/robots.txt
Domain IPs	209.126.35.14
Response IP	209.126.35.14
Found	Yes
Hash	ad18359ba750d55a416ebb75a3284b5a714cb49553560bf10e544ba3a957d992
SimHash	a4cdb9ad6442

Groups

rogerbot

Rule	Path
Disallow

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

Rule	Path
Disallow	/assets
Disallow	/data
Disallow	/events
Disallow	/filings
Disallow	/networks
Disallow	/officers
Disallow	/placeholders
Disallow	/search
Disallow	/statements
Disallow	/users
Disallow	/*?page=
Disallow	/*%26page%3D
Disallow	/*/network.json

Rule

Path

Disallow

/assets

Disallow

/data

Disallow

/events

Disallow

/filings

Disallow

/networks

Disallow

/officers

Disallow

/placeholders

Disallow

/search

Disallow

/statements

Disallow

/users

Disallow

/*?page=

Disallow

/*%26page%3D

Disallow

/*/network.json

Back to top

Other Records

Field	Value
sitemap	https://opencorporates.com/sitemap.xml.gz

Field

Value

sitemap

https://opencorporates.com/sitemap.xml.gz

Back to top

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
To ban all spiders from the entire site uncomment the next two lines:
User-Agent: *
Disallow: /

Back to top

opencorporates.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

rogerbot

gptbot

*

Other Records

Comments

opencorporates.com
robots.txt