ruinart.com
robots.txt

Robots Exclusion Standard data for ruinart.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	ruinart.com
Base Domain	ruinart.com
Scan Status	Ok
Last Scan	2024-11-16T05:57:18+00:00
Next Scan	2024-12-16T05:57:18+00:00

Last Scan

Scanned	2024-11-16T05:57:18+00:00
URL	https://www.ruinart.com/robots.txt
Domain IPs	125.56.219.3, 23.32.29.89, 2600:1413:b000:1d::17d1:2e8b, 2600:1413:b000:1d::17d1:2e94
Response IP	23.32.29.89
Found	Yes
Hash	f441e193b0a547e6a5149c882167ef70e774888b672aa4b224ea1512df966b63
SimHash	2850d7f2cdf8

Groups

*

Rule	Path
Disallow	cgid%3D
Disallow	prefn
Disallow	prefv
Disallow	?famille=
Disallow	?oag=
Disallow	?crgp=
Disallow	/fr-fr/cart*
Disallow	/fr-fr/checkoutlogin*
Disallow	/fr-fr/checkout*
Disallow	/fr-fr/orderconfirmation*
Disallow	/fr-fr/account*
Disallow	/fr-fr/orders*
Disallow	/fr-fr/addaddress*
Disallow	/fr-fr/profile*
Disallow	/fr-fr/editpassword*
Disallow	/fr-fr/addressbook*
Disallow	/fr-fr/search*
Disallow	/de-de/
Disallow	/de-de/*

Rule

Path

Disallow

*cgid%3D*

Disallow

*prefn*

Disallow

*prefv*

Disallow

*?famille=*

Disallow

*?oag=*

Disallow

*?crgp=*

Disallow

/fr-fr/cart*

Disallow

/fr-fr/checkoutlogin*

Disallow

/fr-fr/checkout*

Disallow

/fr-fr/orderconfirmation*

Disallow

/fr-fr/account*

Disallow

/fr-fr/orders*

Disallow

/fr-fr/addaddress*

Disallow

/fr-fr/profile*

Disallow

/fr-fr/editpassword*

Disallow

/fr-fr/addressbook*

Disallow

/fr-fr/search*

Disallow

/de-de/

Disallow

/de-de/*

cazoodlebot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

dotbot/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gigabot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://www.ruinart.com/fr-fr/sitemap_index.xml
sitemap	https://www.ruinart.com/fr-int/sitemap_index.xml

Field

Value

sitemap

https://www.ruinart.com/fr-fr/sitemap_index.xml

sitemap

https://www.ruinart.com/fr-int/sitemap_index.xml

Back to top

Comments

UPLOAD 20032023 TO BE PUBLISH AS https://www.ruinart.com/robots.txt
CHANGE DONE 20032023 : SWITCH TO SFCC
For all robots
Block access to specific groups of pages
FRENCH RUINART crawl
DE-DE RUINART crawl
Allow search crawlers to discover the sitemap
Block CazoodleBot as it does not present correct accept content headers
Block MJ12bot as it is just noise
Block dotbot as it cannot parse base urls properly
Block Gigabot

Back to top

ruinart.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

cazoodlebot

mj12bot

dotbot/1.0

gigabot

Other Records

Comments

ruinart.com
robots.txt