pgcc.edu
robots.txt

Robots Exclusion Standard data for pgcc.edu

Archived Snapshots

Resource Scan

Scan Details

Site Domain	pgcc.edu
Base Domain	pgcc.edu
Scan Status	Ok
Last Scan	2025-07-16T01:05:58+00:00
Next Scan	2025-08-15T01:05:58+00:00

Last Scan

Scanned	2025-07-16T01:05:58+00:00
URL	https://www.pgcc.edu/robots.txt
Domain IPs	54.163.56.119, 75.101.153.12
Response IP	54.163.56.119
Found	Yes
Hash	34987aee64aab0da3b626f21e5d2c33cba7cb210bc773e978718a0a39b726294
SimHash	6505d8114791

Groups

googlebot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	8

Field

Value

crawl-delay

googlebot-image

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	8

Field

Value

crawl-delay

duckduckbot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	8

Field

Value

crawl-delay

bingbot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	8

Field

Value

crawl-delay

msnbot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	8

Field

Value

crawl-delay

gptbot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	8

Field

Value

crawl-delay

scrapy

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	https://www.pgcc.edu/sitemap-en.xml

Field

Value

sitemap

https://www.pgcc.edu/sitemap-en.xml

pgcc.edurobots.txt

Resource Scan

Scan Details

Last Scan

Groups

googlebot

Other Records

googlebot-image

Other Records

duckduckbot

Other Records

bingbot

Other Records

msnbot

Other Records

gptbot

Other Records

scrapy

*

Other Records

pgcc.edu
robots.txt