angeo / module-robots-txt-aeo
angeo/module-robots-txt-aeo
Magento 2 module for AI Engine Optimization (AEO). Injects AI crawler rules (OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent) into robots.txt — without overwriting your existing configuration. Supports per-bot Allow/Disallow lists, Crawl-delay, Sitemap directives, multi-store, and a public Api\RobotsStatusInterface for cross-module integration with angeo/module-aeo-audit.
Angeo Robots.txt AEO — AI Crawler Rules for Magento 2
Injects AI crawler rules into your Magento 2 robots.txt — without overwriting your existing configuration.
Bots managed out-of-the-box: OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent.
Fixes the "robots.txt — AI Bot Access" signal in angeo/module-aeo-audit.
What's new in 2.0
- 5 new built-in bots aligned with the AEO Audit v3 catalogue:
Claude-User,Applebot,cohere-ai,Amazonbot,Meta-ExternalAgent. An out-of-the-box install now passes the AEO Audit'srobots_txtcheck. - Audit-clean output — emitted robots.txt no longer triggers syntax warnings:
Crawl-delaysuppressed on bots that ignore it (GPTBot, ClaudeBot, Google-Extended).- No
Allow: /+Disallow: /conflict on the same agent. - Versioned UAs sanitised at the catalogue layer.
- Sitemap URLs upgraded to
https://when the store base URL is HTTPS.
Api\RobotsStatusInterface— public read-only API for cross-module integration. Consumers likeangeo/module-aeo-auditcan wire to it and skip the HTTP round-trip.- Dedicated cache type
angeo_robots_txt_aeo— flush in isolation from System → Cache Management. - Backend validation —
PathListandCrawlDelaybackend models normalise admin input on save. - CSP-clean admin UI — no inline styles, no inline scripts.
- i18n/en_US.csv — admin labels are translatable.
- Removed runtime remote-registry feature — bot catalogue is now release-managed only. Dynamic catalogue injection from an external endpoint was a security trade-off (anyone with the endpoint could inject UA strings into every install's robots.txt) and a half-implemented UX one (added bots had no admin checkbox). New bots ship via module releases.
- Removed orphan code — the unused
RemoteRegistryUpdatertriplet from 1.x is gone.
See CHANGELOG.md for the full list.
How it works
The module intercepts the robots.txt response at render time via a plugin on
Magento\Robots\Model\Robots::getData() and prepends a managed block of AI bot rules.
No database writes. No filesystem changes. Your existing admin config is untouched.
Inject mode (default — recommended)
# Angeo AEO — AI Crawler Rules
# https://angeo.dev | module-robots-txt-aeo
# Do not edit this block manually — manage via Stores > Config > Angeo > Robots.txt AEO
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
User-agent: Claude-User
Allow: /
User-agent: Applebot
Allow: /
# End Angeo AEO block
User-agent: *
Disallow: /checkout/
... (your existing rules follow unchanged)
# Angeo AEO — Sitemaps
Sitemap: https://example-store.com/sitemap.xml
# End Angeo AEO sitemaps
Replace mode
Regenerates the full robots.txt. Preserves your custom Disallow rules from the existing wildcard block. Use only if you want this module to own the entire file.
Installation
composer require angeo/module-robots-txt-aeo
bin/magento module:enable Angeo_RobotsTxtAeo
bin/magento setup:upgrade
bin/magento setup:di:compile
bin/magento cache:flush
That's it. The module is enabled with sensible defaults — all 10 mainstream AI bots are allowed; the 3 lower-traffic bots (cohere-ai, Amazonbot, Meta-ExternalAgent) are catalogued but disabled by default.
Configuration
Stores → Configuration → Angeo → Robots.txt AEO
| Section | Purpose |
|---|---|
| General | Enable/disable, choose Inject or Replace mode |
| AI Crawlers | Tick which bots to allow. Bots marked ★ are critical for AEO Audit pass |
| AI Crawler Path Overrides | Per-bot Allow:, Disallow:, Crawl-delay: |
| Sitemap Directive | Auto-detect from Magento_Sitemap, manual list, or none |
| Live Preview | Renders the AEO block that will be injected |
All settings respect store scope — multi-store installs can configure each store independently.
CLI
# Render what would be emitted, without applying it
bin/magento angeo:robots:preview [--store=N]
# Fetch the live robots.txt and check enabled bot rules are present
bin/magento angeo:robots:validate [--store=N] [--insecure]
validate exits non-zero when expected bot rules are missing from the live
file — useful in post-deploy smoke tests:
# .github/workflows/post-deploy.yml
- run: bin/magento angeo:robots:validate
For a full AEO scoring of robots.txt (critical-bot checks, syntax warnings,
sitemap quality) install angeo/module-aeo-audit.
It reads the effective output of this module via Api\RobotsStatusInterface —
no HTTP round-trip when both modules are installed.
Cross-module integration (Api\RobotsStatusInterface)
The module exposes a public read-only API that consumer modules can wire to via
DI. Soft-coupling pattern — consumers interface_exists()-check before
declaring the dependency, so they keep working when this module is not installed.
use Angeo\RobotsTxtAeo\Api\RobotsStatusInterface;
class MyChecker
{
public function __construct(
private readonly ?RobotsStatusInterface $robotsStatus = null,
) {}
public function check(int $storeId): void
{
if ($this->robotsStatus !== null) {
// Zero-overhead — pure in-process call
$effective = $this->robotsStatus->getEffectiveRobotsTxt($storeId);
$bots = $this->robotsStatus->getEnabledBotUserAgents($storeId);
// ...
} else {
// Fall back to HTTP fetch
}
}
}
Used by angeo/module-aeo-audit v3+ when both modules are installed.
How robots.txt manual content interacts
The module's admin form (Inject mode) does not modify the existing Magento admin robots.txt textarea (Content → Design → Configuration → Edit Custom instruction of robots.txt). Both sources coexist:
- Your custom block is preserved untouched.
- The AEO block is prepended at render time.
- Re-running the plugin is idempotent — the AEO block is replaced, not stacked.
If you'd rather manage AI bot rules yourself, either disable the module (bin/magento module:disable Angeo_RobotsTxtAeo) or untick individual bots in admin.
Compatibility
| Status | |
|---|---|
| Magento 2.4.6 (PHP 8.1) | ✅ |
| Magento 2.4.7 (PHP 8.2 / 8.3) | ✅ |
| Magento 2.4.8 (PHP 8.3 / 8.4) | ✅ |
| Magento Open Source / Commerce / Cloud | ✅ |
| Hyvä / PWA Studio | ✅ (robots.txt is server-side) |
| Multi-store / multi-website | ✅ |
Magento_Sitemap not installed |
✅ (soft dependency, no-op resolver) |
| Varnish / Fastly | ⚠️ purge CDN cache after config changes |
License
MIT. See LICENSE.
Security
See SECURITY.md for the disclosure policy.
Contributing
See CONTRIBUTING.md.
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning.
[3.0.0] — 2026-06-11
Major release. Every feature is backed by primary-source verification
(vendor crawler docs fetched directly, IETF draft-ietf-aipref-attach
rev. 2026-04-28, RSL 1.0, RFC 9309). Full design in
docs/SPECIFICATION-3.0.0.md. Upgrade note: default behaviour is unchanged —
all new emission features ship disabled; the only output difference on
upgrade is that previously-destroyed third-party directives are now
preserved (a fix).
Fixed
- Data loss of third-party robots.txt directives (Tier 1). INJECT mode
rebuilt the file via parse→render but the renderer never re-emitted
unrecognised directives — silently deletingContent-Signal:,
Content-Usage:andLicense:lines (Cloudflare manages Content Signals
on 3.8M+ domains). The parser now captures top-levelLicense:lines and
group-scoped extra directives, and the renderer re-emits all of them;
injection remains idempotent. - Crawl-delay metadata contradiction. Anthropic's 2026-02 docs state
Crawl-delay IS supported; the hardcoded ignore-list said otherwise.
Replaced by per-bot tri-statesupports_crawl_delay(emit only on
documented support; unknown = suppress).BOTS_IGNORING_CRAWL_DELAY
retained but @deprecated.
Added
- RFC 9309 evaluation engine (
Model\Rep\RepMatcher): group selection
with same-token merging and*fallback, longest-match-wins, Allow
tie-break,*/$patterns, case-sensitive paths. Validate (admin + CLI)
now reports per-bot effective access to/— a bot that is present but
blocked at the root is a failure, with the blocking rule shown. - Catalogue (vendor-verified):
Claude-SearchBot(Anthropic, on) and
OAI-AdsBot(OpenAI ads validation, off);anthropic-aimarked
deprecated by Anthropic (default off); per-botcategory,token_only
(Google-Extended never appears in logs),ip_ranges_url,docs_url;
Unicode-dash normalisation in UA sanitisation; registry cache key bumped. - IETF
Content-Usageemission (draft-ietf-aipref-attach), off by
default: configurable aipref preference (defaulttrain-ai=n) appended to
every managed bot group (and the wildcard group in REPLACE mode). - Cloudflare
Content-Signalemission, off by default: tri-state
search / ai-train / ai-input with defaults mirroring Cloudflare's managed
rollout; unset signals are omitted. - RSL 1.0
License:directive, off by default: global directive with an
https-validated URL, deduplicated against existing License lines. angeo:robots:verify-bot-ip <ip>CLI: checks an address against the
vendor-published IP range endpoints (OpenAI, Perplexity) with IPv4/IPv6
CIDR matching, to detect UA spoofing.- Public API:
RobotsStatusInterface::getEffectiveAccess()and
::getContentSignalLines()(@since 3.0.0). - Tests: RepMatcherTest (RFC 9309 normative cases), CidrMatcherTest, parser
round-trip tests, BotDefinition metadata tests.
Changed
BotDefinitionconstructor gains optional metadata parameters;
Config::resolveBotOverrides()carries all metadata through (2.x dropped
criticalForAuditon override resolution).- Crawl-delay is now emitted only for bots with documented support —
stores that configured a delay for e.g. Applebot will no longer see it in
output (previously emitted; vendor support undocumented).
[2.0.1] — 2026-06-11
Security-hardening release. No functional or configuration changes — drop-in
upgrade from 2.0.0.
Security
UrlFetcherSSRF hardening. libcurl'sCURLOPT_FOLLOWLOCATIONis now
disabled; redirects are followed manually (max 3 hops) and every hop is
validated: target scheme must behttp/https, target host must equal the
original host (a leadingwww.is the only tolerated difference), and
https://→http://downgrades are refused. A redirect violating the
policy fails the fetch immediately, is logged, and is never retried.
Previously an open redirect (or compromised upstream) on the store's own
robots.txtcould steer the admin Validate/Preview fetch — and its response
body — to an arbitrary internal host.- Outbound URLs restricted to
http/httpsat both the validation layer
(scheme allow-list before any network activity) and the transport layer
(CURLOPT_PROTOCOLS = CURLPROTO_HTTP | CURLPROTO_HTTPS). storerequest parameter is now validated. New
Model\Adminhtml\StoreIdResolveraccepts only digit-strings and verifies
the store exists viaStoreRepositoryInterfacebefore use. Previously the
Preview/Validate controllers blind-cast the raw parameter toint,
allowing probing of arbitrary store IDs.- No raw exception leakage from admin AJAX endpoints. Unexpected
exceptions in the Preview/Validate controllers are now logged server-side
and a generic message is returned; only intentionalLocalizedException
messages reach the client. dashboard.jsno longer concatenates server-provided strings into
innerHTML.showAlert()builds DOM nodes viatextContent/
createTextNode, removing the HTML-injection sink entirely (previously
reachable only with admin-controlled payloads, i.e. self-XSS class — fixed
on defense-in-depth grounds).
Added
- Unit tests for the redirect policy, scheme allow-list, and retry semantics
(UrlFetcherTest) and for the newStoreIdResolver
(Test/Unit/Model/Adminhtml/StoreIdResolverTest).
Changed
- Retry semantics clarified: HTTP 5xx and network errors are retried with
backoff; HTTP 4xx, redirect-policy violations, and redirect loops fail
deterministically without retries (4xx behaviour unchanged from 2.0.0).
[2.0.0] — 2026-05-29
Added
- 5 new built-in bots aligned with the
angeo/module-aeo-auditv3 catalogue:
Claude-User,Applebot,cohere-ai,Amazonbot,Meta-ExternalAgent.
Out-of-the-box, an install now produces a robots.txt that passes the audit
module'srobots_txtcheck. Angeo\RobotsTxtAeo\Api\RobotsStatusInterface— public read-only API
exposing the effective robots.txt, enabled bot UAs, sitemaps, and mode.
Cross-module integration withangeo/module-aeo-audit(and any third-party
consumer) is now zero-overhead — no HTTP round-trip required.- Dedicated cache type
angeo_robots_txt_aeo— surfaces in
System → Cache Management and can be flushed in isolation. - Backend models for config validation —
PathListandCrawlDelay
normalise input on save (admin form,config:set, direct DB writes). - i18n/en_US.csv — admin labels are now translatable.
criticalForAuditmetadata onBotDefinition— flags bots whose
blocking causes the AEO Audit to FAIL (currently OAI-SearchBot, GPTBot,
Google-Extended).
Changed
- Audit-clean output sanitisation — emitted robots.txt no longer triggers
syntax warnings from the AEO Audit:Crawl-delaydirectives are suppressed on bots that documentedly ignore
them (GPTBot, ClaudeBot, Google-Extended).- When a bot has
Disallow: /, the implicitAllow: /fallback is dropped
so we never emit both directives on the same agent. - User-agent strings are sanitised at the
BotDefinitionlayer — any
/versionsuffix is stripped (e.g.GPTBot/1.0→GPTBot). - Sitemap URLs are upgraded to
https://when the store base URL is HTTPS.
RobotsInjector::stripStandaloneBotEntriesrewritten to use
RobotsTxtParserinstead of a hand-rolled regex state machine.
Cleaner, ~50 lines smaller, and correct for previously edge-case input.Plugin\RobotsModelPlugin— short-circuits before building the
injector graph when the module is disabled for the current store.composer.json— PHP requirement loosened to~8.1.0||~8.2.0||~8.3.0||~8.4.0,
added hard dependency onmagento/module-robots, pinnedmagento/framework
to^103.0.- Admin Preview block and Dashboard template are now CSP-friendly — all
inline<style>and<script>removed in favour of dedicated CSS/JS
assets loaded via the layout.
Removed
-
Remote bot registry feature — the runtime overlay from
https://angeo.dev/registry/bots.json
is gone. Bot catalogue is now release-managed only. Removed:BotRegistry::refresh(), all signature-verification, and the cache layer
for the overlay.Cron\RefreshRemoteRegistryandetc/crontab.xml.Console\Command\RegistryUpdateCommand(bin/magento angeo:robots:registry:update).<remote_registry>group inetc/adminhtml/system.xmlandetc/config.xml.Config::isRemoteRegistryEnabled()andConfig::getRemoteRegistryUrl().- HMAC-SHA256 signature verification and
X-Angeo-Signatureheader support. - Response headers carried by
FetchResult(no consumer remained). BotRegistryconstructor parameters:ScopeConfigInterface,UrlFetcher,
DeploymentConfig— now takes cache, serializer, logger only.
Rationale: the overlay was a security trade-off (anyone holding the endpoint
could inject UA strings into every install's robots.txt) and a half-implemented
UX (registry-added bots had no admin checkbox so admins couldn't enable them).
New bots ship via module releases — the cadence is already adequate
(the bot landscape changes every 2–3 months; module releases are faster). -
Model\Bot\RemoteRegistryUpdaterandModel\Bot\UpdateResult—
orphan duplicate of the now-also-removedBotRegistry::refresh(). -
Test\Unit\Model\Bot\RemoteRegistryUpdaterTest— tests for the
removed classes. -
Commented-out half-finished DI block in
etc/di.xml.
Migration notes
- Run
bin/magento setup:upgrade && bin/magento setup:di:compile && bin/magento cache:flush. - The new dedicated cache type
angeo_robots_txt_aeoappears in System →
Cache Management — leave it enabled. - Existing per-store config under
angeo_robots_txt_aeo/general/*,
angeo_robots_txt_aeo/bots/*,angeo_robots_txt_aeo/bot_overrides/*,
andangeo_robots_txt_aeo/sitemap/*is preserved verbatim. - New bots (
claude_user,applebot, etc.) inherit theirdefault_enabled
state fromconfig.xmlon first read. - Sites that had
angeo_robots_txt_aeo/remote_registry/*set in DB or
app/etc/config.phpwill see those values become inert — no harm, but you
may runbin/magento config:set angeo_robots_txt_aeo/remote_registry/enabled 0
before upgrade if you want a clean DB. - Consumers of
RemoteRegistryUpdaterorBotRegistry::refresh()(none known)
should migrate to release-tracking — new bots appear inBotRegistry::BUILTIN_BOTS
with each release.
[1.1.0] — 2026-04-25
Added
- Per-bot path overrides — each bot can now have its own
Allow:,Disallow:,
andCrawl-delay:directives, configurable from admin under "AI Crawler Path Overrides". Sitemap:directive support — the module now emitsSitemap:lines into
robots.txt. Three modes: Auto (read from Magento Sitemap module or fall back to
/sitemap.xml), Custom (admin textarea), or None.- Remote bot registry (
https://angeo.dev/registry/bots.json) — optional opt-in
source for newly emerged AI crawlers. New entries are added as suggestions with
default_enabled = false; the admin must explicitly opt in. Daily cron refresh. - Multi-store / multi-website support —
system.xmlnow declares store-scope
fields andConfigreads throughScopeInterface::SCOPE_STORE. Each store can
have its own bot configuration. RobotsTxtParser— proper line-by-line state machine for parsing robots.txt.BotRegistry— central registry of bot definitions with caching and remote overlay.bin/magento angeo:robots:registry:update— CLI command to refresh the remote registry.- CLI
--storeand--insecureflags forpreviewandvalidatecommands. - Cron job
angeo_robots_txt_aeo_registry_refresh(daily at 03:17).
Changed
UrlFetchernow usesMagento\Framework\HTTP\Client\Curlinstead of
file_get_contents. TLS verification is enabled by default;--insecure
available as explicit opt-in. Adds proper timeouts, retries with exponential
backoff, and follow-redirects.- All HTTP responses are now wrapped in
FetchResult(immutable value object). Config::BOTSconstant removed — bot definitions live inBotRegistry.- Plugin now resolves the store ID via
StoreManagerInterfaceso multi-store
installations get correct per-store output.
[1.0.0] — 2026-03-12
Added
- Initial release. Plugin on
Magento\Robots\Model\Robots::getData(). - Default catalog of 8 AI crawler bots.
- Admin configuration UI under Stores → Configuration → Angeo → Robots.txt AEO.
- ACL, sequence on
Magento_Robots, MIT licensed.
| Version | Stability | QA Status | Compatibility | Released |
|---|---|---|---|---|
| 3.0.0 | stable | Fail | Magento 2.4.7-2.4.8 Details | 2026-06-14 18:56:23 |
| 2.0.0 | stable | Fail | Magento 2.4.7-2.4.8 Details | 2026-05-29 21:04:50 |
| 1.0.1 | stable | Not tested | Not yet tested Details | 2026-05-28 19:00:39 |
| 1.0.0 | stable | Not tested | Not yet tested Details | 2026-04-25 20:57:57 |
Requires 6
| Package | Constraint |
|---|---|
| magento/framework | ^103.0 |
| magento/module-backend | ^102.0 |
| magento/module-config | ^101.2 |
| magento/module-robots | ^101.0 |
| magento/module-store | ^101.1 |
| php | ~8.1.0||~8.2.0||~8.3.0||~8.4.0 |
Requires-dev 2
| Package | Constraint |
|---|---|
| magento/magento-coding-standard | ^33.0 |
| phpunit/phpunit | ^10.0 |
Suggests 2
| Package | Reason |
|---|---|
| angeo/module-aeo-audit | Verify your robots.txt AEO signal after installation. v3+ integrates via Angeo\RobotsTxtAeo\Api\RobotsStatusInterface for zero-overhead validation. |
| magento/module-sitemap | Enables auto-detection of Sitemap URLs from the Magento Sitemap module. |
Compatibility
Each Magento release line is installed on its supported PHP versions, then the module is built (DI compilation + static-content deploy) and its unit and integration suites are run. The matrix shows the lines and PHP versions the module is confirmed to install and run on. Code-quality results further down (phpstan, phpcs, …) are reported separately and never affect compatibility.
Code Quality
Advisory checks against the module's source. Static analysis runs once across the whole module; PHPStan re-runs per Magento + PHP version because resolvable symbols differ between releases. These NEVER affect the Compatibility badge. A phpcs finding can't make a module incompatible.
Static analysis
Coding standards (phpcs), mess detection (phpmd), copy-pasted code (cpd), PHP cross-version compatibility, composer.json validity. Each runs once for the whole module.
| Tool | Status | Findings | Summary |
|---|---|---|---|
| PHPCS | Fail | 254 | 9 errors, 245 warnings (ruleset: Magento2) — 171 auto-fixable with phpcbf |
| PHPMD | Warning | 33 | 33 rule violations (CyclomaticComplexity:7, NPathComplexity:7, TooManyPublicMethods:5, ErrorControlOperator:4, MissingImport:4) |
| Cpd | Pass | 0 | |
| Composer validate | Info | 1 | valid; 1 advisory note (composer validate --strict) |
PHPStan
Type-checks the module's PHP against a real Magento install at the configured gate level. Re-runs per Magento and PHP version because resolvable symbols differ between releases.
Tests
Unit and integration suites, run for each applicable Magento and PHP version. A test failure speaks to the module's behaviour, not its compatibility with a Magento line, so it is reported here separately and never reddens the compatibility matrix.
Unit tests
Integration tests
| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | N/A | N/A | ||
| 2.4.8 | N/A | N/A | ||
| 2.4.9 | N/A | N/A |
Security
Security checks run directly against the module: an audit of its declared dependencies for known vulnerabilities (composer audit) and a scan of its source for malware and web-shell signatures. Each runs once. A malware detection fails the version outright.
More from angeo
View vendorMagento 2 module for AI Engine Optimization (AEO). Generates spec-compliant llms.txt and llms-full.txt per llmstxt.org standard, plus streaming JSONL for vector indexing. Multi-store, multi-website, CLI, cron, async admin UI, Page Builder-aware sanitization, customer-group pricing, atomic writes, ETag/Cache-Control, .md mirrors.
Live AI brand visibility audit for Magento 2. Queries ChatGPT, Claude, Perplexity, Gemini and Groq with brand-probing prompts and scores real-world AI recall, citation rate and recommendation presence. Extends angeo/module-aeo-audit v3 via CheckerInterface as the 16th signal, alongside the 15 built-in technical checks.
Magento 2 AEO (AI Engine Optimization) Audit. v3 covers 15 signals — robots.txt AI bots, llms.txt + llms.jsonl, Product / Organization / FAQ schema, merchant return + shipping policies, sitemap.xml, UCP profile, AI product feed, OG tags, canonical + hreflang, JSON-LD quality, well-known endpoint matrix, Core Web Vitals via CrUX. Score Trend dashboard, Admin UI, cron, dynamic fix commands, dependency-injected extension point for custom checkers.
Spec-compliant Universal Commerce Protocol (UCP) profile generator for Magento 2. Generates /.well-known/ucp at protocol version 2026-04-08 with ECDSA P-256 signing keys, declared capabilities, and proper cache headers. v0.1.x is profile-only — catalog, cart, checkout endpoints land in later releases.
Turn an existing module into recurring revenue.
If you already maintain a Magento 2 module on GitHub or GitLab, listing it on Packagento takes about five minutes. We mirror your tags, handle distribution signing, and route paid licenses through Stripe Connect, so you can keep shipping the way you already do.