criteo.com
robots.txt

Robots Exclusion Standard data for criteo.com

Resource Scan

Scan Details

Site Domain criteo.com
Base Domain criteo.com
Scan Status Ok
Last Scan2024-05-02T16:09:25+00:00
Next Scan 2024-05-16T16:09:25+00:00

Last Scan

Scanned2024-05-02T16:09:25+00:00
URL https://criteo.com/robots.txt
Domain IPs 23.185.0.4, 2620:12a:8000::4, 2620:12a:8001::4
Response IP 23.185.0.4
Found Yes
Hash 2ab5b3faef6896fdf92323dbd0af816eb7acf9decdeb4fe97cfb3dc8233539d1
SimHash c844c820bde9

Groups

*

Rule Path
Disallow /*?industry=
Disallow /*?region=
Disallow /*?business=
Disallow /*?solution=
Disallow /*sortby%3D
Disallow /*?action=
Disallow /*?year=
Disallow /privacy/embed/
Disallow */media_category
Disallow */email-preference-center/
Disallow */process/*
Disallow /wp-json/
Disallow */criteo2017/fonts/*
Allow /*.css$
Allow /*.js$
Disallow /trackback
Disallow /*trackback
Disallow /*trackback*
Disallow /*/trackback
Allow /feed/$
Disallow /feed/
Disallow /comments/feed/
Disallow /*/feed/$
Disallow /*/feed/rss/$
Disallow /*/trackback/$
Disallow /*/*/feed/$
Disallow /*/*/feed/rss/$
Disallow /*/*/trackback/$
Disallow /*/*/*/feed/$
Disallow /*/*/*/feed/rss/$
Disallow /*/*/*/trackback/$

msiecrawler
webcopier
httrack
microsoft.url.control
libwww
nuclei
wikido
riddler
petalbot
zoominfobot
go-http-client
node/simplecrawler
cazoodlebot
dotbot/1.0
gigabot
barkrowler
blexbot
magpie-crawler

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.criteo.com/sitemap_index.xml
sitemap https://www.criteo.com/de/sitemap_index.xml
sitemap https://www.criteo.com/fr/sitemap_index.xml
sitemap https://www.criteo.com/br/sitemap_index.xml
sitemap https://www.criteo.com/jp/sitemap_index.xml
sitemap https://www.criteo.com/kr/sitemap_index.xml
sitemap https://www.criteo.com/es/sitemap_index.xml
sitemap https://www.criteo.com/it/sitemap_index.xml
sitemap https://www.criteo.com/ru/sitemap_index.xml

Comments

  • Dynamic URLs
  • Paths & URls
  • -----------------
  • Prevents issues with GWT
  • -----------------
  • Trackbacks
  • -----------------
  • Block feeds for crawlers
  • -----------------
  • Sitemap
  • -----------------
  • Bots list to exclude
  • --------------------------------