jump.mingpao.com
robots.txt

Robots Exclusion Standard data for jump.mingpao.com

Resource Scan

Scan Details

Site Domain jump.mingpao.com
Base Domain mingpao.com
Scan Status Ok
Last Scan2024-06-04T06:00:20+00:00
Next Scan 2024-06-18T06:00:20+00:00

Last Scan

Scanned2024-06-04T06:00:20+00:00
URL https://jump.mingpao.com/robots.txt
Domain IPs 202.80.6.28
Response IP 202.80.6.28
Found Yes
Hash 063379bb1fb2ad97c42423792483702d65c317e453b25f6743262cf2f4bf94a1
SimHash 245cf651c543

Groups

googlebot

Rule Path
Allow /

googlebot-image

Rule Path
Allow

googlebot-mobile

Rule Path
Allow /

mediapartners-google

Rule Path
Allow

bingbot

Rule Path
Allow /

msnbot

Rule Path
Allow /

alexa

Rule Path
Allow /

indeedbot

Rule Path
Disallow /

linkedinbot

Rule Path
Disallow /

jobdiggerspider

Rule Path
Disallow /

cliqzbot

Rule Path
Disallow /

slurp

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

yandexbot

Rule Path
Disallow /

teoma

Rule Path
Disallow /

fast-webcrawler

Rule Path
Disallow /

gurujibot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

exabot

Rule Path
Disallow /

soso spider

Rule Path
Disallow /

dotbot

Rule Path
Disallow

facebookexternalhit

Rule Path
Disallow /

duckduckbot

Rule Path
Disallow /

siteliner

Rule Path
Disallow /

curious george

Rule Path
Disallow /

*

Rule Path
Disallow /cgi-bin/
Disallow /tmp/
Disallow /htm/dummy/
Disallow /m/
Disallow */api/

Other Records

Field Value
sitemap https://jump.mingpao.com/sitemap2/static.xml
sitemap https://jump.mingpao.com/career-news/sitemap_index.xml
sitemap https://jump.mingpao.com/career-news/post-sitemap.xml
sitemap https://jump.mingpao.com/sitemap2/courses/sitemap.xml
sitemap https://jump.mingpao.com/sitemap2/job/sitemap.xml

Warnings

  • 2 invalid lines.