# angeo/module-robots-txt-aeo

> Magento 2 module for AI Engine Optimization (AEO). Injects AI crawler rules (OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent) into robots.txt — without overwriting your existing configuration. Supports per-bot Allow/Disallow lists, Crawl-delay, Sitemap directives, multi-store, and a public Api\RobotsStatusInterface for cross-module integration with angeo/module-aeo-audit.

`composer require angeo/module-robots-txt-aeo`

Canonical URL: https://packagento.com/angeo/module-robots-txt-aeo

## At a glance

- **Vendor**: angeo (https://packagento.com/angeo.md)
- **Latest version**: 3.0.0 — released 2026-06-14
- **Pricing**: Free
- **Package type**: Magento 2 module
- **Status**: active, accepting new buyers

## Installation

Packagento is licence-gated, so even free packages need a licence on a project before Composer can resolve them.

1. **Sign in or create an account** at https://packagento.com/customer/account/.

2. **Add the package to your account.** Open https://packagento.com/angeo/module-robots-txt-aeo and complete the free checkout. A licence is minted automatically.

3. **Create or pick a project, then activate the licence on it.**
   - Projects represent the Magento installs you deploy to. Manage them at https://packagento.com/projects/.
   - Activate the new licence on the project you'll deploy this package to. Activation is what generates the Composer credentials scoped to that project.

4. **Add the project credentials to your Magento codebase.**

   Grab the project's public + private key from https://packagento.com/projects/ (open the project, then its Credentials tab), and add them to `auth.json`:

   ```json
   {
     "http-basic": {
       "packagento.com": {
         "username": "ppk_live_...",
         "password": "psk_live_..."
       }
     }
   }
   ```

   Add the Packagento Composer repository to `composer.json`:

   ```json
   {
     "repositories": [
       { "type": "composer", "url": "https://packagento.com" }
     ]
   }
   ```

5. **Install and apply.**

   ```bash
   composer require angeo/module-robots-txt-aeo:*
   bin/magento setup:upgrade
   bin/magento setup:di:compile
   bin/magento cache:flush
   ```

## What it does

Magento 2 module for AI Engine Optimization (AEO). Injects AI crawler rules (OAI-SearchBot, GPTBot, ChatGPT-User, PerplexityBot, Perplexity-User, Google-Extended, ClaudeBot, anthropic-ai, Claude-User, Applebot, cohere-ai, Amazonbot, Meta-ExternalAgent) into robots.txt — without overwriting your existing configuration. Supports per-bot Allow/Disallow lists, Crawl-delay, Sitemap directives, multi-store, and a public Api\RobotsStatusInterface for cross-module integration with angeo/module-aeo-audit.

## README

[![Packagist](https://img.shields.io/packagist/v/angeo/module-robots-txt-aeo.svg)](https://packagist.org/packages/angeo/module-robots-txt-aeo)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![PHP](https://img.shields.io/badge/php-8.1%20|%208.2%20|%208.3%20|%208.4-8892BF.svg)](https://php.net)
[![Magento](https://img.shields.io/badge/magento-2.4.6%20|%202.4.7%20|%202.4.8-EE672F.svg)](https://magento.com)

Injects AI crawler rules into your Magento 2 `robots.txt` — **without overwriting your existing configuration**.

Bots managed out-of-the-box: `OAI-SearchBot`, `GPTBot`, `ChatGPT-User`, `PerplexityBot`, `Perplexity-User`, `Google-Extended`, `ClaudeBot`, `anthropic-ai`, `Claude-User`, `Applebot`, `cohere-ai`, `Amazonbot`, `Meta-ExternalAgent`.

Fixes the **"robots.txt — AI Bot Access"** signal in [`angeo/module-aeo-audit`](https://packagist.org/packages/angeo/module-aeo-audit).

---

### What's new in 2.0

- **5 new built-in bots** aligned with the AEO Audit v3 catalogue: `Claude-User`, `Applebot`, `cohere-ai`, `Amazonbot`, `Meta-ExternalAgent`. An out-of-the-box install now passes the AEO Audit's `robots_txt` check.
- **Audit-clean output** — emitted robots.txt no longer triggers syntax warnings:
  - `Crawl-delay` suppressed on bots that ignore it (GPTBot, ClaudeBot, Google-Extended).
  - No `Allow: /` + `Disallow: /` conflict on the same agent.
  - Versioned UAs sanitised at the catalogue layer.
  - Sitemap URLs upgraded to `https://` when the store base URL is HTTPS.
- **`Api\RobotsStatusInterface`** — public read-only API for cross-module integration. Consumers like `angeo/module-aeo-audit` can wire to it and skip the HTTP round-trip.
- **Dedicated cache type** `angeo_robots_txt_aeo` — flush in isolation from System → Cache Management.
- **Backend validation** — `PathList` and `CrawlDelay` backend models normalise admin input on save.
- **CSP-clean admin UI** — no inline styles, no inline scripts.
- **i18n/en_US.csv** — admin labels are translatable.
- **Removed runtime remote-registry feature** — bot catalogue is now release-managed only. Dynamic catalogue injection from an external endpoint was a security trade-off (anyone with the endpoint could inject UA strings into every install's robots.txt) and a half-implemented UX one (added bots had no admin checkbox). New bots ship via module releases.
- **Removed orphan code** — the unused `RemoteRegistryUpdater` triplet from 1.x is gone.

See [CHANGELOG.md](CHANGELOG.md) for the full list.

---

### How it works

The module intercepts the robots.txt response at render time via a plugin on
`Magento\Robots\Model\Robots::getData()` and prepends a managed block of AI bot rules.
**No database writes. No filesystem changes.** Your existing admin config is untouched.

#### Inject mode (default — recommended)

```
## Angeo AEO — AI Crawler Rules
## https://angeo.dev | module-robots-txt-aeo
## Do not edit this block manually — manage via Stores > Config > Angeo > Robots.txt AEO

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /
Disallow: /admin/

User-agent: Claude-User
Allow: /

User-agent: Applebot
Allow: /

## End Angeo AEO block

User-agent: *
Disallow: /checkout/
... (your existing rules follow unchanged)

## Angeo AEO — Sitemaps
Sitemap: https://example-store.com/sitemap.xml
## End Angeo AEO sitemaps
```

#### Replace mode

Regenerates the full robots.txt. Preserves your custom `Disallow` rules from the existing wildcard block. Use only if you want this module to own the entire file.

---

### Installation

```bash
composer require angeo/module-robots-txt-aeo
bin/magento module:enable Angeo_RobotsTxtAeo
bin/magento setup:upgrade
bin/magento setup:di:compile
bin/magento cache:flush
```

That's it. The module is enabled with sensible defaults — all 10 mainstream AI bots are allowed; the 3 lower-traffic bots (cohere-ai, Amazonbot, Meta-ExternalAgent) are catalogued but disabled by default.

---

### Configuration

`Stores → Configuration → Angeo → Robots.txt AEO`

| Section | Purpose |
|---|---|
| **General** | Enable/disable, choose Inject or Replace mode |
| **AI Crawlers** | Tick which bots to allow. Bots marked ★ are critical for AEO Audit pass |
| **AI Crawler Path Overrides** | Per-bot `Allow:`, `Disallow:`, `Crawl-delay:` |
| **Sitemap Directive** | Auto-detect from `Magento_Sitemap`, manual list, or none |
| **Live Preview** | Renders the AEO block that will be injected |

All settings respect store scope — multi-store installs can configure each store independently.

---

### CLI

```bash
## Render what would be emitted, without applying it
bin/magento angeo:robots:preview [--store=N]

## Fetch the live robots.txt and check enabled bot rules are present
bin/magento angeo:robots:validate [--store=N] [--insecure]
```

`validate` exits non-zero when expected bot rules are missing from the live
file — useful in post-deploy smoke tests:

```yaml
## .github/workflows/post-deploy.yml
- run: bin/magento angeo:robots:validate
```

For a full AEO scoring of robots.txt (critical-bot checks, syntax warnings,
sitemap quality) install [`angeo/module-aeo-audit`](https://packagist.org/packages/angeo/module-aeo-audit).
It reads the effective output of this module via `Api\RobotsStatusInterface` —
no HTTP round-trip when both modules are installed.

---

### Cross-module integration (Api\RobotsStatusInterface)

The module exposes a public read-only API that consumer modules can wire to via
DI. Soft-coupling pattern — consumers `interface_exists()`-check before
declaring the dependency, so they keep working when this module is not installed.

```php
use Angeo\RobotsTxtAeo\Api\RobotsStatusInterface;

class MyChecker
{
    public function __construct(
        private readonly ?RobotsStatusInterface $robotsStatus = null,
    ) {}

_(README truncated for .md surface. Full README on https://packagento.com/angeo/module-robots-txt-aeo.)_

## Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

### [3.0.0] — 2026-06-11

Major release. Every feature is backed by primary-source verification
(vendor crawler docs fetched directly, IETF draft-ietf-aipref-attach
rev. 2026-04-28, RSL 1.0, RFC 9309). Full design in
docs/SPECIFICATION-3.0.0.md. Upgrade note: default behaviour is unchanged —
all new emission features ship disabled; the only output difference on
upgrade is that previously-destroyed third-party directives are now
preserved (a fix).

#### Fixed
- **Data loss of third-party robots.txt directives (Tier 1).** INJECT mode
  rebuilt the file via parse→render but the renderer never re-emitted
  unrecognised directives — silently deleting `Content-Signal:`,
  `Content-Usage:` and `License:` lines (Cloudflare manages Content Signals
  on 3.8M+ domains). The parser now captures top-level `License:` lines and
  group-scoped extra directives, and the renderer re-emits all of them;
  injection remains idempotent.
- **Crawl-delay metadata contradiction.** Anthropic's 2026-02 docs state
  Crawl-delay IS supported; the hardcoded ignore-list said otherwise.
  Replaced by per-bot tri-state `supports_crawl_delay` (emit only on
  documented support; unknown = suppress). `BOTS_IGNORING_CRAWL_DELAY`
  retained but @deprecated.

#### Added
- **RFC 9309 evaluation engine** (`Model\Rep\RepMatcher`): group selection
  with same-token merging and `*` fallback, longest-match-wins, Allow
  tie-break, `*`/`$` patterns, case-sensitive paths. Validate (admin + CLI)
  now reports per-bot *effective* access to `/` — a bot that is present but
  blocked at the root is a failure, with the blocking rule shown.
- **Catalogue (vendor-verified):** `Claude-SearchBot` (Anthropic, on) and
  `OAI-AdsBot` (OpenAI ads validation, off); `anthropic-ai` marked
  deprecated by Anthropic (default off); per-bot `category`, `token_only`
  (Google-Extended never appears in logs), `ip_ranges_url`, `docs_url`;
  Unicode-dash normalisation in UA sanitisation; registry cache key bumped.
- **IETF `Content-Usage` emission** (draft-ietf-aipref-attach), off by
  default: configurable aipref preference (default `train-ai=n`) appended to
  every managed bot group (and the wildcard group in REPLACE mode).
- **Cloudflare `Content-Signal` emission**, off by default: tri-state
  search / ai-train / ai-input with defaults mirroring Cloudflare's managed
  rollout; unset signals are omitted.
- **RSL 1.0 `License:` directive**, off by default: global directive with an
  https-validated URL, deduplicated against existing License lines.
- **`angeo:robots:verify-bot-ip <ip>` CLI:** checks an address against the
  vendor-published IP range endpoints (OpenAI, Perplexity) with IPv4/IPv6
  CIDR matching, to detect UA spoofing.
- Public API: `RobotsStatusInterface::getEffectiveAccess()` and
  `::getContentSignalLines()` (@since 3.0.0).
- Tests: RepMatcherTest (RFC 9309 normative cases), CidrMatcherTest, parser
  round-trip tests, BotDefinition metadata tests.

#### Changed
- `BotDefinition` constructor gains optional metadata parameters;
  `Config::resolveBotOverrides()` carries all metadata through (2.x dropped
  `criticalForAudit` on override resolution).
- Crawl-delay is now emitted **only** for bots with documented support —
  stores that configured a delay for e.g. Applebot will no longer see it in
  output (previously emitted; vendor support undocumented).

### [2.0.1] — 2026-06-11

Security-hardening release. No functional or configuration changes — drop-in
upgrade from 2.0.0.

_(Changelog truncated for .md surface. Full history on https://packagento.com/angeo/module-robots-txt-aeo.)_

## Recent Versions

| Version | Released |
|---|---|
| 3.0.0 | 2026-06-14 |
| 2.0.0 | 2026-05-29 |
| 1.0.1 | 2026-05-28 |
| 1.0.0 | 2026-04-25 |

## Dependencies

### Require

| Package | Constraint |
|---|---|
| magento/framework | ^103.0 |
| magento/module-backend | ^102.0 |
| magento/module-config | ^101.2 |
| magento/module-robots | ^101.0 |
| magento/module-store | ^101.1 |
| php | ~8.1.0\|\|~8.2.0\|\|~8.3.0\|\|~8.4.0 |

### Require (dev)

| Package | Constraint |
|---|---|
| magento/magento-coding-standard | ^33.0 |
| phpunit/phpunit | ^10.0 |

### Suggest

| Package | Constraint |
|---|---|
| angeo/module-aeo-audit | Verify your robots.txt AEO signal after installation. v3+ integrates via Angeo\RobotsTxtAeo\Api\RobotsStatusInterface for zero-overhead validation. |
| magento/module-sitemap | Enables auto-detection of Sitemap URLs from the Magento Sitemap module. |

## Quality

Latest release (3.0.0) fails the Packagento QA pipeline. Verdicts below are per-cell (Magento line × PHP version) for the matrixed tools, and run-once for the static / security tiers.


### Compatibility

Each Magento line is installed on its supported PHP versions, then the module is built (DI compile + static-content deploy). Cells show passed / failed / untested; staircase gaps render as `–`.

| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | Pass | Pass | – | – |
| 2.4.8 | – | Pass | Pass | – |
| 2.4.9 | – | – | Pass | not tested |


### Code Quality

Advisory checks against the module's source. Never affect the Compatibility verdict — a phpcs finding can't make a module incompatible.

#### Static Analysis

Coding standards (phpcs), mess detection (phpmd), copy-pasted code (cpd), PHP cross-version compatibility, composer.json validity. Each runs once for the whole module.

| Tool | Status | Findings | Summary |
|---|---|---|---|
| PHPCS | Fail | 254 | 9 errors, 245 warnings (ruleset: Magento2) — 171 auto-fixable with phpcbf |
| PHPMD | Warning | 33 | 33 rule violations (CyclomaticComplexity:7, NPathComplexity:7, TooManyPublicMethods:5, ErrorControlOperator:4, MissingImport:4) |
| Cpd | Pass | 0 |  |
| Composer validate | Info | 1 | valid; 1 advisory note (composer validate --strict) |

#### PHPStan

Type-checks the module against a real Magento install. Re-runs per Magento + PHP version because resolvable symbols differ between releases.

| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | 4 | 4 | – | – |
| 2.4.8 | – | 4 | 4 | – |
| 2.4.9 | – | – | 4 | N/A |


### Tests

Unit and integration suites run per Magento + PHP cell. Test failures speak to the module's behaviour, not its compatibility with a line, so they're reported here separately.

#### Unit Tests

| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | 4 | 4 | – | – |
| 2.4.8 | – | 4 | not tested | – |
| 2.4.9 | – | – | 4 | N/A |

#### Integration Tests

| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | N/A | N/A | – | – |
| 2.4.8 | – | N/A | N/A | – |
| 2.4.9 | – | – | N/A | N/A |


### Security

Dependency-advisory audit (composer audit) plus a source malware scan. A malware detection fails the version outright.

| Tool | Status | Findings | Summary |
|---|---|---|---|
| Composer audit | Pass | 0 |  |
| Malware scan | Pass | 0 |  |

## Licence and pricing

Free. A licence is still minted on checkout and bound to your project for Composer access — no payment step.

Refundable within 14 days of first purchase via https://packagento.com/account/refunds/.

## Install via Claude Code or any MCP client

The Packagento MCP server can run the licence + project + Composer steps above in one tool call:

```
purchase_and_install_packages(
  composer_names=["angeo/module-robots-txt-aeo"],
  project_id="proj_xxx"
)
```

This handles cart, checkout, licence minting, project activation, and writes auth.json credentials. Connect a client with `claude mcp add packagento https://mcp.packagento.com`. Full setup at https://packagento.com/docs/mcp-setup.

## Vendor

angeo is a Magento 2 vendor on Packagento. See https://packagento.com/angeo.md for their full catalogue.

