ipcinfo.org
robots.txt

Robots Exclusion Standard data for ipcinfo.org

Resource Scan

Scan Details

Site Domain ipcinfo.org
Base Domain ipcinfo.org
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-08-17T17:59:46+00:00
Next Scan 2024-10-16T17:59:46+00:00

Last Successful Scan

Scanned2022-08-16T10:57:52+00:00
URL https://www.ipcinfo.org/robots.txt
Response IP 104.18.29.90
Found Yes
Hash 20f13c9944719c8fc9db5d7459b0cfe7e50d08d1bc50a63de6f89ee761e4ac15
SimHash b3021d5abcf8

Groups

blp_bbot/0.1

Rule Path
Disallow /

blp_bbot

Rule Path
Disallow /

*

Rule Path Comment
Disallow /index.php -
Disallow /t3lib/ Nothing to see here
Disallow /typo3/ Nothing to see here
Disallow /typo3conf/ Nothing to see here
Disallow /typo3temp/ Nothing to see here
Disallow /appses/esc/escb/user/ Obsolete site 4/12/2012
Disallow /tc/tca/ Obsolete site 4/12/2012
Disallow /act-network/ Obsolete site 4/12/2012
Disallow /countryProfiles/ Invalid Alias 4/12/2012
Disallow /landandwater/ Obsolete site 4/12/2012
Disallow /sd/researchinstitutions/ Large source of errors 5/12/2012 - doesnt crawl
Disallow /services/ IT Service Page
Disallow /figis/flod/worms/ issues with Linked Open Data Pages 6/1/2013
Disallow /alc/ duplicating content at www.rlc.fao.org
Disallow /pwb/2000/ application currently givin errors (25/04/2014 - nw)
Disallow /pwb/2002/ application currently givin errors (28/03/2014 - nw)
Disallow /pwb/2004/ application currently givin errors (28/03/2014 - nw)
Disallow /pwb/2005/ application currently givin errors (28/03/2014 - nw)
Disallow /pwb/2006/ application currently givin errors (28/03/2014 - nw)
Disallow /Participation/ old site generating a lot of crawl errors 18/06/2013
Disallow /geonetwork/ Requested by KV-CIO 10/Oct/2013
Disallow /*?id=* Disable non-realurl - re-instated 10/Oct/2013
Disallow /*%26type%3D98 - specified in Google webmaster tools for the Google exclusion - re-instated 10/Oct/2013
Disallow /fileadmin/user_upload/hlpe/ -

Comments

  • robots.txt for other domains than http://www.fao.org/
  • This file is not for hiding content from people. It is no substitue for security
  • If you are editing the robots.txt file - please COMMENT and DATE reason for every inclusion/exclusion ---nw-OCC-2013
  • ^^^^^^^ ^^^^
  • User-agent: 008 # No longer relevant 25/10/2013 nw
  • Disallow: /
  • User-Agent: cdlwas_bot # No longer relevant 25/10/2013 nw
  • Disallow: # No longer relevant 25/10/2013 nw
  • Some bots are known to be trouble.