angeo / module-robots-txt-aeo

angeo/module-robots-txt-aeo

Magento 2 module for AI Engine Optimization (AEO). Injects AI crawler rules (OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent) into robots.txt — without overwriting your existing configuration. Supports per-bot Allow/Disallow lists, Crawl-delay, Sitemap directives, multi-store, and a public Api\RobotsStatusInterface for cross-module integration with angeo/module-aeo-audit.

magento2-module 2.4.6-2.4.9 Compatible Based on composer requirements only QA: failed MIT

Angeo Robots.txt AEO — AI Crawler Rules for Magento 2

Packagist
License
PHP
Magento

Injects AI crawler rules into your Magento 2 robots.txtwithout overwriting your existing configuration.

Bots managed out-of-the-box: OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent.

Fixes the "robots.txt — AI Bot Access" signal in angeo/module-aeo-audit.


What's new in 2.0

  • 5 new built-in bots aligned with the AEO Audit v3 catalogue: Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent. An out-of-the-box install now passes the AEO Audit's robots_txt check.
  • Audit-clean output — emitted robots.txt no longer triggers syntax warnings:
    • Crawl-delay suppressed on bots that ignore it (GPTBot, ClaudeBot, Google-Extended).
    • No Allow: / + Disallow: / conflict on the same agent.
    • Versioned UAs sanitised at the catalogue layer.
    • Sitemap URLs upgraded to https:// when the store base URL is HTTPS.
  • Api\RobotsStatusInterface — public read-only API for cross-module integration. Consumers like angeo/module-aeo-audit can wire to it and skip the HTTP round-trip.
  • Dedicated cache type angeo_robots_txt_aeo — flush in isolation from System → Cache Management.
  • Backend validationPathList and CrawlDelay backend models normalise admin input on save.
  • CSP-clean admin UI — no inline styles, no inline scripts.
  • i18n/en_US.csv — admin labels are translatable.
  • Removed runtime remote-registry feature — bot catalogue is now release-managed only. Dynamic catalogue injection from an external endpoint was a security trade-off (anyone with the endpoint could inject UA strings into every install's robots.txt) and a half-implemented UX one (added bots had no admin checkbox). New bots ship via module releases.
  • Removed orphan code — the unused RemoteRegistryUpdater triplet from 1.x is gone.

See CHANGELOG.md for the full list.


How it works

The module intercepts the robots.txt response at render time via a plugin on
Magento\Robots\Model\Robots::getData() and prepends a managed block of AI bot rules.
No database writes. No filesystem changes. Your existing admin config is untouched.

Inject mode (default — recommended)

# Angeo AEO — AI Crawler Rules
# https://angeo.dev | module-robots-txt-aeo
# Do not edit this block manually — manage via Stores > Config > Angeo > Robots.txt AEO

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /
Disallow: /admin/

User-agent: Claude-User
Allow: /

User-agent: Applebot
Allow: /

# End Angeo AEO block

User-agent: *
Disallow: /checkout/
... (your existing rules follow unchanged)

# Angeo AEO — Sitemaps
Sitemap: https://example-store.com/sitemap.xml
# End Angeo AEO sitemaps

Replace mode

Regenerates the full robots.txt. Preserves your custom Disallow rules from the existing wildcard block. Use only if you want this module to own the entire file.


Installation

composer require angeo/module-robots-txt-aeo
bin/magento module:enable Angeo_RobotsTxtAeo
bin/magento setup:upgrade
bin/magento setup:di:compile
bin/magento cache:flush

That's it. The module is enabled with sensible defaults — all 10 mainstream AI bots are allowed; the 3 lower-traffic bots (cohere-ai, Amazonbot, Meta-ExternalAgent) are catalogued but disabled by default.


Configuration

Stores → Configuration → Angeo → Robots.txt AEO

Section Purpose
General Enable/disable, choose Inject or Replace mode
AI Crawlers Tick which bots to allow. Bots marked ★ are critical for AEO Audit pass
AI Crawler Path Overrides Per-bot Allow:, Disallow:, Crawl-delay:
Sitemap Directive Auto-detect from Magento_Sitemap, manual list, or none
Live Preview Renders the AEO block that will be injected

All settings respect store scope — multi-store installs can configure each store independently.


CLI

# Render what would be emitted, without applying it
bin/magento angeo:robots:preview [--store=N]

# Fetch the live robots.txt and check enabled bot rules are present
bin/magento angeo:robots:validate [--store=N] [--insecure]

validate exits non-zero when expected bot rules are missing from the live
file — useful in post-deploy smoke tests:

# .github/workflows/post-deploy.yml
- run: bin/magento angeo:robots:validate

For a full AEO scoring of robots.txt (critical-bot checks, syntax warnings,
sitemap quality) install angeo/module-aeo-audit.
It reads the effective output of this module via Api\RobotsStatusInterface
no HTTP round-trip when both modules are installed.


Cross-module integration (Api\RobotsStatusInterface)

The module exposes a public read-only API that consumer modules can wire to via
DI. Soft-coupling pattern — consumers interface_exists()-check before
declaring the dependency, so they keep working when this module is not installed.

use Angeo\RobotsTxtAeo\Api\RobotsStatusInterface;

class MyChecker
{
    public function __construct(
        private readonly ?RobotsStatusInterface $robotsStatus = null,
    ) {}

    public function check(int $storeId): void
    {
        if ($this->robotsStatus !== null) {
            // Zero-overhead — pure in-process call
            $effective = $this->robotsStatus->getEffectiveRobotsTxt($storeId);
            $bots      = $this->robotsStatus->getEnabledBotUserAgents($storeId);
            // ...
        } else {
            // Fall back to HTTP fetch
        }
    }
}

Used by angeo/module-aeo-audit v3+ when both modules are installed.


How robots.txt manual content interacts

The module's admin form (Inject mode) does not modify the existing Magento admin robots.txt textarea (Content → Design → Configuration → Edit Custom instruction of robots.txt). Both sources coexist:

  • Your custom block is preserved untouched.
  • The AEO block is prepended at render time.
  • Re-running the plugin is idempotent — the AEO block is replaced, not stacked.

If you'd rather manage AI bot rules yourself, either disable the module (bin/magento module:disable Angeo_RobotsTxtAeo) or untick individual bots in admin.


Compatibility

Status
Magento 2.4.6 (PHP 8.1)
Magento 2.4.7 (PHP 8.2 / 8.3)
Magento 2.4.8 (PHP 8.3 / 8.4)
Magento Open Source / Commerce / Cloud
Hyvä / PWA Studio ✅ (robots.txt is server-side)
Multi-store / multi-website
Magento_Sitemap not installed ✅ (soft dependency, no-op resolver)
Varnish / Fastly ⚠️ purge CDN cache after config changes

License

MIT. See LICENSE.

Security

See SECURITY.md for the disclosure policy.

Contributing

See CONTRIBUTING.md.

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.

[2.0.0] — 2026-05-29

Added

  • 5 new built-in bots aligned with the angeo/module-aeo-audit v3 catalogue:
    Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent.
    Out-of-the-box, an install now produces a robots.txt that passes the audit
    module's robots_txt check.
  • Angeo\RobotsTxtAeo\Api\RobotsStatusInterface — public read-only API
    exposing the effective robots.txt, enabled bot UAs, sitemaps, and mode.
    Cross-module integration with angeo/module-aeo-audit (and any third-party
    consumer) is now zero-overhead — no HTTP round-trip required.
  • Dedicated cache type angeo_robots_txt_aeo — surfaces in
    System → Cache Management and can be flushed in isolation.
  • Backend models for config validationPathList and CrawlDelay
    normalise input on save (admin form, config:set, direct DB writes).
  • i18n/en_US.csv — admin labels are now translatable.
  • criticalForAudit metadata on BotDefinition — flags bots whose
    blocking causes the AEO Audit to FAIL (currently OAI-SearchBot, GPTBot,
    Google-Extended).

Changed

  • Audit-clean output sanitisation — emitted robots.txt no longer triggers
    syntax warnings from the AEO Audit:
    • Crawl-delay directives are suppressed on bots that documentedly ignore
      them (GPTBot, ClaudeBot, Google-Extended).
    • When a bot has Disallow: /, the implicit Allow: / fallback is dropped
      so we never emit both directives on the same agent.
    • User-agent strings are sanitised at the BotDefinition layer — any
      /version suffix is stripped (e.g. GPTBot/1.0GPTBot).
    • Sitemap URLs are upgraded to https:// when the store base URL is HTTPS.
  • RobotsInjector::stripStandaloneBotEntries rewritten to use
    RobotsTxtParser instead of a hand-rolled regex state machine.
    Cleaner, ~50 lines smaller, and correct for previously edge-case input.
  • Plugin\RobotsModelPlugin — short-circuits before building the
    injector graph when the module is disabled for the current store.
  • composer.json — PHP requirement loosened to ~8.1.0||~8.2.0||~8.3.0||~8.4.0,
    added hard dependency on magento/module-robots, pinned magento/framework
    to ^103.0.
  • Admin Preview block and Dashboard template are now CSP-friendly — all
    inline <style> and <script> removed in favour of dedicated CSS/JS
    assets loaded via the layout.

Removed

  • Remote bot registry feature — the runtime overlay from https://angeo.dev/registry/bots.json
    is gone. Bot catalogue is now release-managed only. Removed:

    • BotRegistry::refresh(), all signature-verification, and the cache layer
      for the overlay.
    • Cron\RefreshRemoteRegistry and etc/crontab.xml.
    • Console\Command\RegistryUpdateCommand (bin/magento angeo:robots:registry:update).
    • <remote_registry> group in etc/adminhtml/system.xml and etc/config.xml.
    • Config::isRemoteRegistryEnabled() and Config::getRemoteRegistryUrl().
    • HMAC-SHA256 signature verification and X-Angeo-Signature header support.
    • Response headers carried by FetchResult (no consumer remained).
    • BotRegistry constructor parameters: ScopeConfigInterface, UrlFetcher,
      DeploymentConfig — now takes cache, serializer, logger only.

    Rationale: the overlay was a security trade-off (anyone holding the endpoint
    could inject UA strings into every install's robots.txt) and a half-implemented
    UX (registry-added bots had no admin checkbox so admins couldn't enable them).
    New bots ship via module releases — the cadence is already adequate
    (the bot landscape changes every 2–3 months; module releases are faster).

  • Model\Bot\RemoteRegistryUpdater and Model\Bot\UpdateResult
    orphan duplicate of the now-also-removed BotRegistry::refresh().

  • Test\Unit\Model\Bot\RemoteRegistryUpdaterTest — tests for the
    removed classes.

  • Commented-out half-finished DI block in etc/di.xml.

Migration notes

  • Run bin/magento setup:upgrade && bin/magento setup:di:compile && bin/magento cache:flush.
  • The new dedicated cache type angeo_robots_txt_aeo appears in System →
    Cache Management — leave it enabled.
  • Existing per-store config under angeo_robots_txt_aeo/general/*,
    angeo_robots_txt_aeo/bots/*, angeo_robots_txt_aeo/bot_overrides/*,
    and angeo_robots_txt_aeo/sitemap/* is preserved verbatim.
  • New bots (claude_user, applebot, etc.) inherit their default_enabled
    state from config.xml on first read.
  • Sites that had angeo_robots_txt_aeo/remote_registry/* set in DB or
    app/etc/config.php will see those values become inert — no harm, but you
    may run bin/magento config:set angeo_robots_txt_aeo/remote_registry/enabled 0
    before upgrade if you want a clean DB.
  • Consumers of RemoteRegistryUpdater or BotRegistry::refresh() (none known)
    should migrate to release-tracking — new bots appear in BotRegistry::BUILTIN_BOTS
    with each release.

[1.1.0] — 2026-04-25

Added

  • Per-bot path overrides — each bot can now have its own Allow:, Disallow:,
    and Crawl-delay: directives, configurable from admin under "AI Crawler Path Overrides".
  • Sitemap: directive support — the module now emits Sitemap: lines into
    robots.txt. Three modes: Auto (read from Magento Sitemap module or fall back to
    /sitemap.xml), Custom (admin textarea), or None.
  • Remote bot registry (https://angeo.dev/registry/bots.json) — optional opt-in
    source for newly emerged AI crawlers. New entries are added as suggestions with
    default_enabled = false; the admin must explicitly opt in. Daily cron refresh.
  • Multi-store / multi-website supportsystem.xml now declares store-scope
    fields and Config reads through ScopeInterface::SCOPE_STORE. Each store can
    have its own bot configuration.
  • RobotsTxtParser — proper line-by-line state machine for parsing robots.txt.
  • BotRegistry — central registry of bot definitions with caching and remote overlay.
  • bin/magento angeo:robots:registry:update — CLI command to refresh the remote registry.
  • CLI --store and --insecure flags for preview and validate commands.
  • Cron job angeo_robots_txt_aeo_registry_refresh (daily at 03:17).

Changed

  • UrlFetcher now uses Magento\Framework\HTTP\Client\Curl instead of
    file_get_contents. TLS verification is enabled by default; --insecure
    available as explicit opt-in. Adds proper timeouts, retries with exponential
    backoff, and follow-redirects.
  • All HTTP responses are now wrapped in FetchResult (immutable value object).
  • Config::BOTS constant removed — bot definitions live in BotRegistry.
  • Plugin now resolves the store ID via StoreManagerInterface so multi-store
    installations get correct per-store output.

[1.0.0] — 2026-03-12

Added

  • Initial release. Plugin on Magento\Robots\Model\Robots::getData().
  • Default catalog of 8 AI crawler bots.
  • Admin configuration UI under Stores → Configuration → Angeo → Robots.txt AEO.
  • ACL, sequence on Magento_Robots, MIT licensed.
Versions
Version Stability QA Status Released
2.0.0 stable Fail 2026-05-29 21:04:50
1.0.1 stable Not tested 2026-05-28 19:00:39
1.0.0 stable Not tested 2026-04-25 20:57:57

Requires 6

Package Constraint
magento/framework ^103.0
magento/module-backend ^102.0
magento/module-config ^101.2
magento/module-robots ^101.0
magento/module-store ^101.1
php ~8.1.0||~8.2.0||~8.3.0||~8.4.0

Requires-dev 2

Package Constraint
magento/magento-coding-standard ^33.0
phpunit/phpunit ^10.0

Suggests 2

Package Reason
angeo/module-aeo-audit Verify your robots.txt AEO signal after installation. v3+ integrates via Angeo\RobotsTxtAeo\Api\RobotsStatusInterface for zero-overhead validation.
magento/module-sitemap Enables auto-detection of Sitemap URLs from the Magento Sitemap module.
QA results
Tool Status Findings Summary
PHPCS Fail 4 4 errors (gating threshold: error-severity=10, ruleset: Magento2)
PHPStan Fail 2 2 errors (level 4, ruleset: phpstan + bitexpert/phpstan-magento)
Cpd Pass 0
Security Pass 0
License
MIT
Homepage
https://angeo.dev
Authors
Make it pay

Turn an existing module into recurring revenue.

If you already maintain a Magento 2 module on GitHub or GitLab, listing it on Packagento takes about five minutes. We mirror your tags, handle distribution signing, and route paid licenses through Stripe Connect, so you can keep shipping the way you already do.