gousa.cn
robots.txt

Robots Exclusion Standard data for gousa.cn

Resource Scan

Scan Details

Site Domain gousa.cn
Base Domain gousa.cn
Scan Status Ok
Last Scan2024-10-02T17:11:42+00:00
Next Scan 2024-10-09T17:11:42+00:00

Last Scan

Scanned2024-10-02T17:11:42+00:00
URL http://gousa.cn/robots.txt
Redirect http://www.gousa.cn/robots.txt
Redirect Domain www.gousa.cn
Redirect Base gousa.cn
Domain IPs 47.100.106.209
Redirect IPs 180.101.203.202, 180.101.203.203, 180.101.203.220, 180.101.203.221, 180.101.203.234, 58.218.215.160, 58.218.215.161, 58.218.215.162, 58.218.215.163, 58.218.215.164, 61.160.192.115, 61.160.192.116, 61.160.192.117, 61.160.192.118, 61.160.192.119
Response IP 180.101.203.230
Found Yes
Hash 2c8ac22c8eaa574069d61ff2825847d6c6217d2cf2bfbbfdcf0a1e8980658253
SimHash 3c16bd5bc740

Groups

*

Rule Path
Allow /core/*.css$
Allow /core/*.css?
Allow /core/*.js$
Allow /core/*.js?
Allow /core/*.gif
Allow /core/*.jpg
Allow /core/*.jpeg
Allow /core/*.png
Allow /core/*.svg
Allow /profiles/*.css$
Allow /profiles/*.css?
Allow /profiles/*.js$
Allow /profiles/*.js?
Allow /profiles/*.gif
Allow /profiles/*.jpg
Allow /profiles/*.jpeg
Allow /profiles/*.png
Allow /profiles/*.svg
Disallow /core/
Disallow /profiles/
Disallow /README.txt
Disallow /web.config
Disallow /admin/
Disallow /comment/reply/
Disallow /filter/tips/
Disallow /node/add/*
Disallow /search/
Disallow /taxonomy/
Disallow /user/register/
Disallow /user/password/
Disallow /user/login/
Disallow /user/logout/
Disallow /index.php/admin/
Disallow /index.php/comment/reply/
Disallow /index.php/filter/tips/
Disallow /index.php/node/add/
Disallow /index.php/search/
Disallow /index.php/user/password/
Disallow /index.php/user/register/
Disallow /index.php/user/login/
Disallow /index.php/user/logout/
Disallow /api/tripadvisor-attractions/*
Disallow /taxonomy/*
Disallow /search*
Disallow /cdn-cgi/
Disallow /subtopic/

Other Records

Field Value
sitemap https://www.gousa.cn/sitemap.xml

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/robotstxt.html
  • CSS, JS, Images
  • Directories
  • Files
  • Paths (clean URLs)
  • Paths (no clean URLs)
  • Custom