attendanceguru.com
robots.txt

Robots Exclusion Standard data for attendanceguru.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	attendanceguru.com
Base Domain	attendanceguru.com
Scan Status	Ok
Last Scan	2025-09-23T22:03:46+00:00
Next Scan	2025-10-23T22:03:46+00:00

Last Scan

Scanned	2025-09-23T22:03:46+00:00
URL	https://attendanceguru.com/robots.txt
Domain IPs	3.165.102.115, 3.165.102.122, 3.165.102.123, 3.165.102.96
Response IP	3.165.102.122
Found	Yes
Hash	5ce0bb647793433a4e115108c2bd7c32fdb528f65954ad02e5200cba3f03b3d9
SimHash	a9a21559a456

Groups

*

Rule	Path	Comment
Allow	/	-
Disallow	/login/	-
Disallow	/	/edzon/
Disallow	/signinupextended	-
Allow	/*.css	-
Allow	/*.js	-
Allow	/*.png	-
Allow	/*.jpg	-
Allow	/*.gif	-
Allow	/*.svg	-
Allow	/ads.txt	-
Allow	/ads/preferences/	-
Allow	/gpt/	-
Allow	/pagead/show_ads.js	-
Allow	/pagead/js/adsbygoogle.js	-
Allow	/pagead/js/*/show_ads_impl.js	-
Allow	/static/glade.js	-
Allow	/static/glade/	-

Rule

Path

Comment

Allow

/

-

Disallow

/login/

-

Disallow

/

/edzon/

Disallow

/signinupextended

-

Allow

/*.css

-

Allow

/*.js

-

Allow

/*.png

-

Allow

/*.jpg

-

Allow

/*.gif

-

Allow

/*.svg

-

Allow

/ads.txt

-

Allow

/ads/preferences/

-

Allow

/gpt/

-

Allow

/pagead/show_ads.js

-

Allow

/pagead/js/adsbygoogle.js

-

Allow

/pagead/js/*/show_ads_impl.js

-

Allow

/static/glade.js

-

Allow

/static/glade/

-

Back to top

Other Records

Field	Value
sitemap	https://attendanceguru.com/subdomain.xml

Field

Value

sitemap

https://attendanceguru.com/subdomain.xml

Back to top

Comments

Best Practices robots.txt Example
1. Sitemap Declaration(s)
Always declare your sitemap(s) to help search engines discover your important pages.
Use the full URL to your sitemap(s). If you have multiple, list them all.
Sitemap: https://website-sitemap.s3.ap-south-1.amazonaws.com/subdomain.xml
Sitemap: https://website-sitemap.s3.ap-south-1.amazonaws.com/conversationseo.xml
Sitemap: https://website-sitemap.s3.ap-south-1.amazonaws.com/sitemap.xml
2. User-Agent Directives
Apply directives to all crawlers unless a specific crawler needs different rules.
3. General Allowance (often implicit or good for clarity)
Allow crawling of the entire site by default. More specific Disallow rules will override this for specific paths.
4. Disallow Directives (Commonly Blocked Areas)
Block areas that are not intended for public search results or are purely functional.
- Administrative areas (e.g., login, admin dashboards)
- User-specific pages (e.g., user profiles, settings) that are not public
- Internal search result pages (can create infinite crawl loops and low-value content)
- Shopping cart/checkout processes (once the user starts them)
- Development/staging environments
Specific disallows from your original list (adjust as needed based on intent)
Disallow: /#/edzon/attendanceguru/ # Only if this path is truly not meant for indexing
5. Handling of CSS, JavaScript, and Images (CRITICAL FOR RENDERING)
Google explicitly recommends *not* blocking CSS, JavaScript, or images that are
essential for rendering the page's content or understanding its layout.
Blocking them can lead to "degraded" or "incomplete" rendering by Googlebot.
If you have non-essential JS/CSS (e.g., very large analytics files that don't affect content),
you *could* disallow them, but it's often not necessary.
ALLOW all CSS and JS for proper rendering.
6. Specific Allowances for Third-Party Scripts (like AdSense, Google Analytics)
These are often allowed even if there's a broader disallow that might accidentally catch them.
Your original file had good examples of these.

Back to top

attendanceguru.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

Comments

attendanceguru.com
robots.txt