constructiondive.com
robots.txt

Robots Exclusion Standard data for constructiondive.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	constructiondive.com
Base Domain	constructiondive.com
Scan Status	Ok
Last Scan	2024-11-13T15:16:59+00:00
Next Scan	2024-11-20T15:16:59+00:00

Last Scan

Scanned	2024-11-13T15:16:59+00:00
URL	https://constructiondive.com/robots.txt
Redirect	https://www.constructiondive.com/robots.txt
Redirect Domain	www.constructiondive.com
Redirect Base	constructiondive.com
Domain IPs	104.18.35.96, 172.64.152.160, 2606:4700:4400::6812:2360, 2606:4700:4400::ac40:98a0
Redirect IPs	104.18.35.96, 172.64.152.160, 2606:4700:4400::6812:2360, 2606:4700:4400::ac40:98a0
Response IP	104.18.35.96
Found	Yes
Hash	e7080d702e9e9c29e1873841cf48b02421af44799fa73009f405b3840370cd40
SimHash	fb7683504752

Groups

*

Rule	Path
Disallow	/admin/
Disallow	/newsletter/
Disallow	/healthcheck/
Disallow	/subpage/
Disallow	/ckeditor/
Disallow	/api/
Disallow	/static/images/
Disallow	/subscriber/
Disallow	/search/*
Allow	/search/$
Disallow	/user_media/
Disallow	/imgproxy/
Disallow	/topic/?page=*

Rule

Path

Disallow

/admin/

Disallow

/newsletter/

Disallow

/healthcheck/

Disallow

/subpage/

Disallow

/ckeditor/

Disallow

/api/

Disallow

/static/images/

Disallow

/subscriber/

Disallow

/search/*

Allow

/search/$

Disallow

/user_media/

Disallow

/imgproxy/

Disallow

/topic/?page=*

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

twitterbot

Rule	Path
Disallow

Rule

Path

Disallow

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

googlebot-news

Rule	Path
Disallow	/admin/
Disallow	/newsletter/
Disallow	/healthcheck/
Disallow	/subpage/
Disallow	/ckeditor/
Disallow	/api/
Disallow	/static/images/
Disallow	/subscriber/
Disallow	/search/*
Allow	/search/$
Allow	/google_news_sitemap.xml
Allow	/user_media/
Allow	/imgproxy/
Allow	/static/images/favicons

Rule

Path

Disallow

/admin/

Disallow

/newsletter/

Disallow

/healthcheck/

Disallow

/subpage/

Disallow

/ckeditor/

Disallow

/api/

Disallow

/static/images/

Disallow

/subscriber/

Disallow

/search/*

Allow

/search/$

Allow

/google_news_sitemap.xml

Allow

/user_media/

Allow

/imgproxy/

Allow

/static/images/favicons

googlebot-image

Rule	Path
Disallow	/
Allow	/imgproxy/
Allow	/static/images/favicons
Allow	/favicon.ico
Allow	/apple-touch-icon.png
Allow	/favicon-32x32.png
Allow	/favicon-16x16.png
Allow	/site.webmanifest
Allow	/safari-pinned-tab.svg
Allow	/browserconfig.xml
Allow	/android-chrome-144x144.png
Allow	/mstile-150x150.png

Rule

Path

Disallow

Allow

/imgproxy/

Allow

/static/images/favicons

Allow

/favicon.ico

Allow

/apple-touch-icon.png

Allow

/favicon-32x32.png

Allow

/favicon-16x16.png

Allow

/site.webmanifest

Allow

/safari-pinned-tab.svg

Allow

/browserconfig.xml

Allow

/android-chrome-144x144.png

Allow

/mstile-150x150.png

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sentibot

Rule	Path
Disallow	/

Rule

Path

Disallow

claritybot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookexternalhit/1.1

Rule	Path
Disallow

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/admin/
Disallow	/newsletter/
Disallow	/healthcheck/
Disallow	/subpage/
Disallow	/ckeditor/
Disallow	/api/
Disallow	/static/images/
Disallow	/subscriber/
Disallow	/search/*
Allow	/search/$
Disallow	/user_media/
Disallow	/imgproxy/
Disallow	/topic/
Disallow	/editors/

Rule

Path

Disallow

/admin/

Disallow

/newsletter/

Disallow

/healthcheck/

Disallow

/subpage/

Disallow

/ckeditor/

Disallow

/api/

Disallow

/static/images/

Disallow

/subscriber/

Disallow

/search/*

Allow

/search/$

Disallow

/user_media/

Disallow

/imgproxy/

Disallow

/topic/

Disallow

/editors/

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

amazonbot

Rule	Path
Disallow	/admin/
Disallow	/newsletter/
Disallow	/healthcheck/
Disallow	/subpage/
Disallow	/ckeditor/
Disallow	/api/
Disallow	/static/images/
Disallow	/subscriber/
Disallow	/search/*
Allow	/search/$
Disallow	/user_media/
Disallow	/imgproxy/
Disallow	/topic/
Disallow	/editors/
Disallow	/signup/*
Allow	/signup/$

Rule

Path

Disallow

/admin/

Disallow

/newsletter/

Disallow

/healthcheck/

Disallow

/subpage/

Disallow

/ckeditor/

Disallow

/api/

Disallow

/static/images/

Disallow

/subscriber/

Disallow

/search/*

Allow

/search/$

Disallow

/user_media/

Disallow

/imgproxy/

Disallow

/topic/

Disallow

/editors/

Disallow

/signup/*

Allow

/signup/$

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

amazon-qbusiness

Product	Comment
amazon-qbusiness	Amazon Q Web Crawler

Product

Comment

amazon-qbusiness

Amazon Q Web Crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

amazon-kendra

Product	Comment
amazon-kendra	Amazon Kendra Web Crawler

Product

Comment

amazon-kendra

Amazon Kendra Web Crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

..;coxkOOOOOOkxoc;'.
.:d0NWMMMMMMMMMMMMMMWN0xc'
.:kXMMMMMMMMMMMMMMMMMMMMMMMXl.
.c0WMMMMMMMMMMMMMMMMMMMMMMMXd'
,OWMMMMMMMMMMMMMMMMMMMMMMMXo' ..
cXMMMMMMXo::::::::::::::col. .lKXl.
lNMMMMMMM0' .lKWMMNo
:XMMMMMMMM0' .l0WMMMMMNc
.OMMMMMMMMM0' .ccccccc;. ,KMMMMMMMMO.
:NMMMMMMMMM0' oWMMMMMMWKc. oWMMMMMMMN:
lWMMMMMMMMM0' oWMMMMMMMMX: ,KMMMMMMMMo
oMMMMMMMMMM0' oWMMMMMMMMNc ,KMMMMMMMMd
cNMMMMMMMMM0' oWMMMMMMMNd. lWMMMMMMMWl
'0MMMMMMMMWk. ,oooooooc' ,0MMMMMMMMK,
oWMMMMMMXo. ,0MMMMMMMMWo
.xWMMMXd' ,dXMMMMMMMMWk.
.xWNx' .',''''''',,;coONMMMMMMMMMWk.
.:, .l0WWWWWWWWWWWMMMMMMMMMMMMMNd.
.lKWMMMMMMMMMMMMMMMMMMMMMMMWk;
.lKWMMMMMMMMMMMMMMMMMMMMMMMNk;.
.ckXWMMMMMMMMMMMMMMMMMMWXkl'
.;ldO0XNWWWWWWNXKOxl;.
..'',,,,''..
NOTE: Allow is a non-standard directive for robots.txt. It is allowed by Google bots. See https://developers.google.com/search/reference/robots_txt#allow
Crawl delay asks bots to wait this many seconds between requests. Ignored by google.
no deep queries to search
don't index our dynamic images
don't deep index topic pages
Rules for specific crawlers below. Note that these don't stack. If you create a specific user-agent rule
you should copy the rules over from '*' above by hand.
Allow Twitter to see all links
Allow Googlebot-News to see header images and favicons, BUT make it follow all the directives from our * group
See below link for why we have to repeat these directives
https://developers.google.com/search/reference/robots_txt#order-of-precedence-for-user-agents
no deep queries to search
Allow Google News to see header images and favicons
Googlebot-Image is now used for favicons. Allow it to see favicon-related files but nothing else
Don't let PetalBot crawl at all
Block ChatGPT bot https://platform.openai.com/docs/gptbot
Block sentione.com
block seoclarity.net/bot.html
block omgili.com/crawler.html
All Facebook crawler user-agent to see all
We want this bot to crawl way slower http://ahrefs.com/robot/
no deep queries to search
don't index our dynamic images
Restrict what Amazonbot (Alexa) can see, and a
no deep queries to search
don't index our dynamic images

constructiondive.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

twitterbot

Other Records

googlebot-news

googlebot-image

petalbot

gptbot

sentibot

claritybot

omgilibot

omgili

facebookexternalhit/1.1

ahrefsbot

Other Records

amazonbot

Other Records

amazon-qbusiness

amazon-kendra

Comments

constructiondive.com
robots.txt