104.com.tw
robots.txt

Robots Exclusion Standard data for 104.com.tw

Resource Scan

Scan Details

Site Domain 104.com.tw
Base Domain 104.com.tw
Scan Status Ok
Last Scan2024-11-02T20:05:37+00:00
Next Scan 2024-11-16T20:05:37+00:00

Last Scan

Scanned2024-11-02T20:05:37+00:00
URL https://104.com.tw/robots.txt
Redirect https://www.104.com.tw/robots.txt
Redirect Domain www.104.com.tw
Redirect Base 104.com.tw
Domain IPs 104.18.9.226
Redirect IPs 104.18.8.226, 104.18.9.226
Response IP 104.18.9.226
Found Yes
Hash da56bb67f1ee7c90c98678848db769e53afbe2715096b1e2900d68d7fd44c169
SimHash 9a425512cb93

Groups

*

Rule Path
Allow /$
Allow /job/
Allow /company/
Allow /jobs/
Allow /expats/
Allow /mobile/
Allow /faq/
Allow /feedback/
Allow /csr/
Allow /favicon.ico
Allow /topic/
Disallow *keyword%3D*xyz*
Disallow *keyword%3D*telegram*
Disallow *keyword%3D*tg%3A*
Disallow *keyword%3D*.*
Disallow *keyword%3D*%3A*
Disallow *keyword%3D*%3B*
Disallow *keyword%3D*%28*
Disallow *keyword%3D*%7C*
Disallow *keyword%3D*/*
Disallow *keyword%3D*~*
Disallow *keyword%3D*%40*
Disallow *keyword%3D*%C3%AF%C2%BC%C2%BB*
Disallow *excludeCompanyByCustno%3D*
Disallow *excludeIndustryCat%3D*
Disallow *expansionType%3D*
Disallow *remoteWork%3D*
Disallow *recommendJob%3D*
Disallow *langStatus%3D*
Disallow *langFlag%3D*
Disallow *hotJob%3D*
Disallow *kwop%3D*
Disallow *page%3D*
Disallow *mode%3D*
Disallow *asc%3D*
Disallow *irsTag%3D*
Disallow *ro%3D*
Disallow *utm_*
Disallow *%26/students/%3D*null%26*
Disallow /job/*apply%3Dform*
Disallow /job/apply/done/
Disallow *keyword%3D*%C3%A5
Disallow /

gptbot

Rule Path
Allow /$
Allow /jobs/
Allow /job/
Allow /faq/
Allow /expats/
Allow /feedback/
Allow /csr/
Disallow /

Other Records

Field Value
crawl-delay 10

jobdiggerspider
cliqzbot
trovitbot

Rule Path
Disallow /

linkedinbot

Rule Path
Disallow /job/
Disallow /company/
Disallow /jobs/

Comments

  • Welcome to 104.com.tw's robots.txt!
  • We're flattered by your interest in our data, but we have some ground rules:
  • 1. No automated scraping of job listings for commercial purposes.
  • 2. Respect our users' privacy and our intellectual property.
  • 3. If you're a bot with good intentions, let's talk!
  • Email alliance@104.com.tw to apply for crawling permissions.
  • Remember: 104.com.tw is here to help people find great jobs and companies
  • find great talent. Let's work together to make that happen!
  • P.S. If you're a human reading this,
  • check out our career opportunities at https://www.104.com.tw/company/12v3o7uw#info06.
  • We're always looking for talented individuals to join our team.

Warnings

  • 23 invalid lines.