angeo / module-robots-txt-aeo
angeo/module-robots-txt-aeo
Magento 2 module for AI Engine Optimization (AEO). Injects AI crawler rules (OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent) into robots.txt — without overwriting your existing configuration. Supports per-bot Allow/Disallow lists, Crawl-delay, Sitemap directives, multi-store, and a public Api\RobotsStatusInterface for cross-module integration with angeo/module-aeo-audit.
Angeo Robots.txt AEO — AI Crawler Rules for Magento 2
Injects AI crawler rules into your Magento 2 robots.txt — without overwriting your existing configuration.
Bots managed out-of-the-box: OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent.
Fixes the "robots.txt — AI Bot Access" signal in angeo/module-aeo-audit.
What's new in 2.0
- 5 new built-in bots aligned with the AEO Audit v3 catalogue:
Claude-User,Applebot,cohere-ai,Amazonbot,Meta-ExternalAgent. An out-of-the-box install now passes the AEO Audit'srobots_txtcheck. - Audit-clean output — emitted robots.txt no longer triggers syntax warnings:
Crawl-delaysuppressed on bots that ignore it (GPTBot, ClaudeBot, Google-Extended).- No
Allow: /+Disallow: /conflict on the same agent. - Versioned UAs sanitised at the catalogue layer.
- Sitemap URLs upgraded to
https://when the store base URL is HTTPS.
Api\RobotsStatusInterface— public read-only API for cross-module integration. Consumers likeangeo/module-aeo-auditcan wire to it and skip the HTTP round-trip.- Dedicated cache type
angeo_robots_txt_aeo— flush in isolation from System → Cache Management. - Backend validation —
PathListandCrawlDelaybackend models normalise admin input on save. - CSP-clean admin UI — no inline styles, no inline scripts.
- i18n/en_US.csv — admin labels are translatable.
- Removed runtime remote-registry feature — bot catalogue is now release-managed only. Dynamic catalogue injection from an external endpoint was a security trade-off (anyone with the endpoint could inject UA strings into every install's robots.txt) and a half-implemented UX one (added bots had no admin checkbox). New bots ship via module releases.
- Removed orphan code — the unused
RemoteRegistryUpdatertriplet from 1.x is gone.
See CHANGELOG.md for the full list.
How it works
The module intercepts the robots.txt response at render time via a plugin on
Magento\Robots\Model\Robots::getData() and prepends a managed block of AI bot rules.
No database writes. No filesystem changes. Your existing admin config is untouched.
Inject mode (default — recommended)
# Angeo AEO — AI Crawler Rules
# https://angeo.dev | module-robots-txt-aeo
# Do not edit this block manually — manage via Stores > Config > Angeo > Robots.txt AEO
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
User-agent: Claude-User
Allow: /
User-agent: Applebot
Allow: /
# End Angeo AEO block
User-agent: *
Disallow: /checkout/
... (your existing rules follow unchanged)
# Angeo AEO — Sitemaps
Sitemap: https://example-store.com/sitemap.xml
# End Angeo AEO sitemaps
Replace mode
Regenerates the full robots.txt. Preserves your custom Disallow rules from the existing wildcard block. Use only if you want this module to own the entire file.
Installation
composer require angeo/module-robots-txt-aeo
bin/magento module:enable Angeo_RobotsTxtAeo
bin/magento setup:upgrade
bin/magento setup:di:compile
bin/magento cache:flush
That's it. The module is enabled with sensible defaults — all 10 mainstream AI bots are allowed; the 3 lower-traffic bots (cohere-ai, Amazonbot, Meta-ExternalAgent) are catalogued but disabled by default.
Configuration
Stores → Configuration → Angeo → Robots.txt AEO
| Section | Purpose |
|---|---|
| General | Enable/disable, choose Inject or Replace mode |
| AI Crawlers | Tick which bots to allow. Bots marked ★ are critical for AEO Audit pass |
| AI Crawler Path Overrides | Per-bot Allow:, Disallow:, Crawl-delay: |
| Sitemap Directive | Auto-detect from Magento_Sitemap, manual list, or none |
| Live Preview | Renders the AEO block that will be injected |
All settings respect store scope — multi-store installs can configure each store independently.
CLI
# Render what would be emitted, without applying it
bin/magento angeo:robots:preview [--store=N]
# Fetch the live robots.txt and check enabled bot rules are present
bin/magento angeo:robots:validate [--store=N] [--insecure]
validate exits non-zero when expected bot rules are missing from the live
file — useful in post-deploy smoke tests:
# .github/workflows/post-deploy.yml
- run: bin/magento angeo:robots:validate
For a full AEO scoring of robots.txt (critical-bot checks, syntax warnings,
sitemap quality) install angeo/module-aeo-audit.
It reads the effective output of this module via Api\RobotsStatusInterface —
no HTTP round-trip when both modules are installed.
Cross-module integration (Api\RobotsStatusInterface)
The module exposes a public read-only API that consumer modules can wire to via
DI. Soft-coupling pattern — consumers interface_exists()-check before
declaring the dependency, so they keep working when this module is not installed.
use Angeo\RobotsTxtAeo\Api\RobotsStatusInterface;
class MyChecker
{
public function __construct(
private readonly ?RobotsStatusInterface $robotsStatus = null,
) {}
public function check(int $storeId): void
{
if ($this->robotsStatus !== null) {
// Zero-overhead — pure in-process call
$effective = $this->robotsStatus->getEffectiveRobotsTxt($storeId);
$bots = $this->robotsStatus->getEnabledBotUserAgents($storeId);
// ...
} else {
// Fall back to HTTP fetch
}
}
}
Used by angeo/module-aeo-audit v3+ when both modules are installed.
How robots.txt manual content interacts
The module's admin form (Inject mode) does not modify the existing Magento admin robots.txt textarea (Content → Design → Configuration → Edit Custom instruction of robots.txt). Both sources coexist:
- Your custom block is preserved untouched.
- The AEO block is prepended at render time.
- Re-running the plugin is idempotent — the AEO block is replaced, not stacked.
If you'd rather manage AI bot rules yourself, either disable the module (bin/magento module:disable Angeo_RobotsTxtAeo) or untick individual bots in admin.
Compatibility
| Status | |
|---|---|
| Magento 2.4.6 (PHP 8.1) | ✅ |
| Magento 2.4.7 (PHP 8.2 / 8.3) | ✅ |
| Magento 2.4.8 (PHP 8.3 / 8.4) | ✅ |
| Magento Open Source / Commerce / Cloud | ✅ |
| Hyvä / PWA Studio | ✅ (robots.txt is server-side) |
| Multi-store / multi-website | ✅ |
Magento_Sitemap not installed |
✅ (soft dependency, no-op resolver) |
| Varnish / Fastly | ⚠️ purge CDN cache after config changes |
License
MIT. See LICENSE.
Security
See SECURITY.md for the disclosure policy.
Contributing
See CONTRIBUTING.md.
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.
[2.0.0] — 2026-05-29
Added
- 5 new built-in bots aligned with the
angeo/module-aeo-auditv3 catalogue:
Claude-User,Applebot,cohere-ai,Amazonbot,Meta-ExternalAgent.
Out-of-the-box, an install now produces a robots.txt that passes the audit
module'srobots_txtcheck. Angeo\RobotsTxtAeo\Api\RobotsStatusInterface— public read-only API
exposing the effective robots.txt, enabled bot UAs, sitemaps, and mode.
Cross-module integration withangeo/module-aeo-audit(and any third-party
consumer) is now zero-overhead — no HTTP round-trip required.- Dedicated cache type
angeo_robots_txt_aeo— surfaces in
System → Cache Management and can be flushed in isolation. - Backend models for config validation —
PathListandCrawlDelay
normalise input on save (admin form,config:set, direct DB writes). - i18n/en_US.csv — admin labels are now translatable.
criticalForAuditmetadata onBotDefinition— flags bots whose
blocking causes the AEO Audit to FAIL (currently OAI-SearchBot, GPTBot,
Google-Extended).
Changed
- Audit-clean output sanitisation — emitted robots.txt no longer triggers
syntax warnings from the AEO Audit:Crawl-delaydirectives are suppressed on bots that documentedly ignore
them (GPTBot, ClaudeBot, Google-Extended).- When a bot has
Disallow: /, the implicitAllow: /fallback is dropped
so we never emit both directives on the same agent. - User-agent strings are sanitised at the
BotDefinitionlayer — any
/versionsuffix is stripped (e.g.GPTBot/1.0→GPTBot). - Sitemap URLs are upgraded to
https://when the store base URL is HTTPS.
RobotsInjector::stripStandaloneBotEntriesrewritten to use
RobotsTxtParserinstead of a hand-rolled regex state machine.
Cleaner, ~50 lines smaller, and correct for previously edge-case input.Plugin\RobotsModelPlugin— short-circuits before building the
injector graph when the module is disabled for the current store.composer.json— PHP requirement loosened to~8.1.0||~8.2.0||~8.3.0||~8.4.0,
added hard dependency onmagento/module-robots, pinnedmagento/framework
to^103.0.- Admin Preview block and Dashboard template are now CSP-friendly — all
inline<style>and<script>removed in favour of dedicated CSS/JS
assets loaded via the layout.
Removed
-
Remote bot registry feature — the runtime overlay from
https://angeo.dev/registry/bots.json
is gone. Bot catalogue is now release-managed only. Removed:BotRegistry::refresh(), all signature-verification, and the cache layer
for the overlay.Cron\RefreshRemoteRegistryandetc/crontab.xml.Console\Command\RegistryUpdateCommand(bin/magento angeo:robots:registry:update).<remote_registry>group inetc/adminhtml/system.xmlandetc/config.xml.Config::isRemoteRegistryEnabled()andConfig::getRemoteRegistryUrl().- HMAC-SHA256 signature verification and
X-Angeo-Signatureheader support. - Response headers carried by
FetchResult(no consumer remained). BotRegistryconstructor parameters:ScopeConfigInterface,UrlFetcher,
DeploymentConfig— now takes cache, serializer, logger only.
Rationale: the overlay was a security trade-off (anyone holding the endpoint
could inject UA strings into every install's robots.txt) and a half-implemented
UX (registry-added bots had no admin checkbox so admins couldn't enable them).
New bots ship via module releases — the cadence is already adequate
(the bot landscape changes every 2–3 months; module releases are faster). -
Model\Bot\RemoteRegistryUpdaterandModel\Bot\UpdateResult—
orphan duplicate of the now-also-removedBotRegistry::refresh(). -
Test\Unit\Model\Bot\RemoteRegistryUpdaterTest— tests for the
removed classes. -
Commented-out half-finished DI block in
etc/di.xml.
Migration notes
- Run
bin/magento setup:upgrade && bin/magento setup:di:compile && bin/magento cache:flush. - The new dedicated cache type
angeo_robots_txt_aeoappears in System →
Cache Management — leave it enabled. - Existing per-store config under
angeo_robots_txt_aeo/general/*,
angeo_robots_txt_aeo/bots/*,angeo_robots_txt_aeo/bot_overrides/*,
andangeo_robots_txt_aeo/sitemap/*is preserved verbatim. - New bots (
claude_user,applebot, etc.) inherit theirdefault_enabled
state fromconfig.xmlon first read. - Sites that had
angeo_robots_txt_aeo/remote_registry/*set in DB or
app/etc/config.phpwill see those values become inert — no harm, but you
may runbin/magento config:set angeo_robots_txt_aeo/remote_registry/enabled 0
before upgrade if you want a clean DB. - Consumers of
RemoteRegistryUpdaterorBotRegistry::refresh()(none known)
should migrate to release-tracking — new bots appear inBotRegistry::BUILTIN_BOTS
with each release.
[1.1.0] — 2026-04-25
Added
- Per-bot path overrides — each bot can now have its own
Allow:,Disallow:,
andCrawl-delay:directives, configurable from admin under "AI Crawler Path Overrides". Sitemap:directive support — the module now emitsSitemap:lines into
robots.txt. Three modes: Auto (read from Magento Sitemap module or fall back to
/sitemap.xml), Custom (admin textarea), or None.- Remote bot registry (
https://angeo.dev/registry/bots.json) — optional opt-in
source for newly emerged AI crawlers. New entries are added as suggestions with
default_enabled = false; the admin must explicitly opt in. Daily cron refresh. - Multi-store / multi-website support —
system.xmlnow declares store-scope
fields andConfigreads throughScopeInterface::SCOPE_STORE. Each store can
have its own bot configuration. RobotsTxtParser— proper line-by-line state machine for parsing robots.txt.BotRegistry— central registry of bot definitions with caching and remote overlay.bin/magento angeo:robots:registry:update— CLI command to refresh the remote registry.- CLI
--storeand--insecureflags forpreviewandvalidatecommands. - Cron job
angeo_robots_txt_aeo_registry_refresh(daily at 03:17).
Changed
UrlFetchernow usesMagento\Framework\HTTP\Client\Curlinstead of
file_get_contents. TLS verification is enabled by default;--insecure
available as explicit opt-in. Adds proper timeouts, retries with exponential
backoff, and follow-redirects.- All HTTP responses are now wrapped in
FetchResult(immutable value object). Config::BOTSconstant removed — bot definitions live inBotRegistry.- Plugin now resolves the store ID via
StoreManagerInterfaceso multi-store
installations get correct per-store output.
[1.0.0] — 2026-03-12
Added
- Initial release. Plugin on
Magento\Robots\Model\Robots::getData(). - Default catalog of 8 AI crawler bots.
- Admin configuration UI under Stores → Configuration → Angeo → Robots.txt AEO.
- ACL, sequence on
Magento_Robots, MIT licensed.
Requires 6
| Package | Constraint |
|---|---|
| magento/framework | ^103.0 |
| magento/module-backend | ^102.0 |
| magento/module-config | ^101.2 |
| magento/module-robots | ^101.0 |
| magento/module-store | ^101.1 |
| php | ~8.1.0||~8.2.0||~8.3.0||~8.4.0 |
Requires-dev 2
| Package | Constraint |
|---|---|
| magento/magento-coding-standard | ^33.0 |
| phpunit/phpunit | ^10.0 |
Suggests 2
| Package | Reason |
|---|---|
| angeo/module-aeo-audit | Verify your robots.txt AEO signal after installation. v3+ integrates via Angeo\RobotsTxtAeo\Api\RobotsStatusInterface for zero-overhead validation. |
| magento/module-sitemap | Enables auto-detection of Sitemap URLs from the Magento Sitemap module. |
| Tool | Status | Findings | Summary |
|---|---|---|---|
| PHPCS | Fail | 4 | 4 errors (gating threshold: error-severity=10, ruleset: Magento2) |
| PHPStan | Fail | 2 | 2 errors (level 4, ruleset: phpstan + bitexpert/phpstan-magento) |
| Cpd | Pass | 0 | |
| Security | Pass | 0 |
Turn an existing module into recurring revenue.
If you already maintain a Magento 2 module on GitHub or GitLab, listing it on Packagento takes about five minutes. We mirror your tags, handle distribution signing, and route paid licenses through Stripe Connect, so you can keep shipping the way you already do.