ldtalentwork.com
robots.txt

Robots Exclusion Standard data for ldtalentwork.com

Resource Scan

Scan Details

Site Domain ldtalentwork.com
Base Domain ldtalentwork.com
Scan Status Ok
Last Scan2024-06-14T13:36:25+00:00
Next Scan 2024-07-14T13:36:25+00:00

Last Scan

Scanned2024-06-14T13:36:25+00:00
URL https://www.ldtalentwork.com/robots.txt
Domain IPs 52.11.62.193, 54.68.157.221
Response IP 54.68.157.221
Found Yes
Hash b97e19a8ad47b76e2177c1326c0c0ac203fa6a61a45890a625c608e2912ed3e8
SimHash 8e7644e3569a

Groups

*

Rule Path
Disallow /private/
Disallow /junk/
Disallow *?*crt*
Disallow /admin*
Disallow /add_skills*
Disallow /dashboard*
Disallow /reference*
Disallow /unsubscribe*
Disallow /subscribe*
Disallow /events*
Disallow /history*
Disallow /client/download_worksessions*
Disallow /client/matches*
Disallow /client/hiring*
Disallow /client/reject*
Disallow /client/worksessions*
Disallow /client/workhistory*
Disallow /client/checkinhrs*
Disallow /client/engineer_request*
Disallow /client/checkinhrs*
Disallow /client/disapprove_worksession*
Disallow /client/approve_worksession*
Disallow /client/testimonial*
Disallow /client/hirestats*
Disallow /freelancer/resume*
Disallow /freelancer/acceptance*
Disallow /freelancer/search_projects*
Disallow /freelancer/matches*
Disallow /freelancer/worksessions*
Disallow /freelancer/updateskills*
Disallow /freelancer/requests*
Disallow /subscribe-email

a6-indexer

Rule Path
Disallow /

alphaseobot

Rule Path
Disallow /

alphaseobot-sa

Rule Path
Disallow /

aspiegelbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

barkrowler

Rule Path
Disallow /

blackboard safeassign

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

crawler4j

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

liebaofast

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

mauibot (crawler.feedback+wc@gmail.com)

Rule Path
Disallow /

megaindex.ru/2.0

Rule Path
Disallow /

mqqbrowser

Rule Path
Disallow /

nimbostratus-bot/v1.3.2

Rule Path
Disallow /

qwant-news

Rule Path
Disallow /

qwantify

Rule Path
Disallow /

seekport crawler

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

sputnikbot/2.3

Rule Path
Disallow /

the knowledge ai

Rule Path
Disallow /

timpibot/0.8

Rule Path
Disallow /

tinytestbot

Rule Path
Disallow /

ucbrowser

Rule Path
Disallow /

yacybot

Rule Path
Disallow /

yandexbot

Rule Path
Disallow /

yandexbot/3.0

Rule Path
Disallow /

yeti

Rule Path
Disallow /

yisouspider

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

slackbot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

applewebkit

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

Comments

  • Block bots - taken from https://www.sil.org/robots.txt, https://user-agents.net/bots
  • Keep Applebot
  • RDH, 03/11/22:
  • Comment this out for JOT, who applied for a Crossref Similiarty Check account with TurnitIn;
  • User-agent: TurnitinBot
  • Disallow: /
  • User-agent: Mozilla/5.0
  • Disallow: /
  • commenting out since some online sources suggest this bot is owned by or partnered with Google
  • basically we blocked all the bots in /var/log/nginx/access.log except the google bot to improve site speed