angeo / module-llms-txt

angeo/module-llms-txt

Magento 2 module for AI Engine Optimization (AEO). Generates spec-compliant llms.txt and llms-full.txt per llmstxt.org standard, plus streaming JSONL for vector indexing. Multi-store, multi-website, CLI, cron, async admin UI, Page Builder-aware sanitization, customer-group pricing, atomic writes, ETag/Cache-Control, .md mirrors.

magento2-module 2.4.6-2.4.9 Compatible Based on composer requirements only QA: failed MIT

Angeo LLMs.txt — Magento 2 Module

AI Engine Optimization (AEO) for Magento 2 / Adobe Commerce. Generates
spec-compliant llms.txt, llms-full.txt, and JSONL files so ChatGPT,
Claude, Gemini, Perplexity, and other LLM-powered crawlers can ingest your
catalog efficiently.

Magento
PHP
License


What this module does

After install, your storefront serves:

URL What it is
https://shop/llms.txt Spec-compliant llmstxt.org file (compact markdown)
https://shop/llms-full.txt Same structure, full sanitized descriptions inline
https://shop/llms.jsonl One JSON record per line, for vector indexing
https://shop/{url-key}.md On-the-fly Markdown mirror of any product/category/CMS page

Generation happens via cron (daily by default), CLI, or the admin "Generate
Now" button. The output is streamed to disk with bounded memory, atomically
renamed on completion, and served with proper ETag / Cache-Control headers.


Why this module exists

LLM crawlers can ingest a typical Magento storefront — full theme, JS, image
sprites, navigation chrome — but that's wasteful for everyone. The
llmstxt.org standard defines a clean text format
optimized for AI ingestion: stable links, structured headings, descriptions
in their natural prose form rather than buried in product cards.

This module produces that format for Magento, with care taken for the things
Magento makes hard: multi-store layout, Page Builder content, CMS directive
resolution, customer-group pricing, and very large catalogs.


Installation

composer require angeo/module-llms-txt:^3.0
bin/magento module:enable Angeo_LlmsTxt
bin/magento setup:upgrade
bin/magento setup:di:compile      # only in production mode
bin/magento setup:static-content:deploy adminhtml   # only in production mode
bin/magento cache:flush

Then generate your first batch:

bin/magento angeo:llms:generate

Visit https://your-store.tld/llms.txt.


Configuration reference

All settings live at Stores → Configuration → Angeo → LLMs.txt.

General

Field Default Notes
Enable Yes Master switch.
Exclude This Scope No Available at website + store scope. Skips generation for this scope.
Store Summary One-line summary used as the spec-compliant blockquote. If empty, falls back to Design → HTML Head → Default Description.

Content

Field Default Notes
Include Categories Yes
Include CMS Pages Yes
Include Products Yes
Products under ## Optional Yes Recommended. Lets context-budget-constrained AI clients drop products without losing categories / pages.
Product Limit 5000 0 = unlimited.
Exclude Out-of-Stock Products No
CMS Identifiers to Exclude no-route, enable-cookies, privacy-policy-cookie-restriction-mode Comma- or newline-separated.
Customer Group for Pricing NOT LOGGED IN Which group's final price (with special / group prices) is exposed.

Output formats

Field Default Notes
Generate llms.txt Yes
Generate llms-full.txt No 5–50× larger; enable only if you actually want it.
Generate JSONL Yes One record per line; embeds-ready.
Serve /url-key.md Mirrors No Per-entity Markdown rendering; on-the-fly, no disk.

Content sanitization

Field Default Notes
Resolve CMS Directives Yes Renders {{widget}}, {{block}}, {{var}} via Magento's frontend filter.
Page Builder Strategy Exclude See below.
Excluded Content-Types products, banner, slider, slide, video, map, buttons, button-item, block, dynamic-block, divider, spacer Used under Exclude strategy.
Allowed Content-Types text, heading, html, tabs, tab-item, row, column, column-group Used under Allow strategy.

Page Builder strategies

Strategy Effect
Preserve Keep all Page Builder content; only strip wrapper attributes.
Exclude Drop elements whose data-content-type is in the excluded list. Default.
Allow Drop everything EXCEPT data-content-type in the allowed list.
Strip Drop ALL elements that carry a data-content-type attribute.

The filter parses content with DOMDocument (not regex), so nested Page
Builder containers are handled correctly. Known content-types include:
row, column-group, column, tabs, tab-item, text, heading,
html, image, video, map, divider, spacer, buttons, button-item,
banner, slider, slide, products, block, dynamic-block.

Performance

Field Default Notes
Collection Page Size 1000 Lower if hitting memory limits on shared hosting.

HTTP caching

Field Default Notes
Cache-Control TTL (s) 3600 Sent as public, max-age=… on the served files.

Cron

Field Default Notes
Cron Expression 0 2 * * * Daily at 02:00 server time.

CLI commands

# Generate everything for all eligible stores
bin/magento angeo:llms:generate

# Single store, skip JSONL
bin/magento angeo:llms:generate --store=default --no-jsonl

# Per-store/per-format last-run status
bin/magento angeo:llms:status

# Lint generated files for spec compliance
bin/magento angeo:llms:validate

Extending — custom providers

Drop a new section into llms.txt (e.g. a "Brands" list, a "Recent Posts"
section, etc.) by implementing Angeo\LlmsTxt\Api\ProviderInterface and
registering it via di.xml.

namespace Vendor\Module\Provider\Llms;

use Angeo\LlmsTxt\Api\OutputContextInterface;
use Angeo\LlmsTxt\Model\Provider\AbstractProvider;

class BrandsProvider extends AbstractProvider
{
    public function provide(OutputContextInterface $context): iterable
    {
        yield "## Brands\n\n";
        foreach ($this->brandRepo->getList($context->getStore()->getId()) as $brand) {
            $label = $this->escapeMarkdown($brand->getName());
            yield "- [{$label}]({$brand->getUrl()})\n";
        }
        yield "\n";
    }
}
<!-- etc/di.xml -->
<type name="Angeo\LlmsTxt\Model\Generator\LlmsTxtGenerator">
    <arguments>
        <argument name="providers" xsi:type="array">
            <item name="brands" xsi:type="object">Vendor\Module\Provider\Llms\BrandsProvider</item>
        </argument>
    </arguments>
</type>

The base class gives you escapeMarkdown(), encodeJsonl(), isJsonl(),
isFullTxt(), and isApplicable() overridable to opt out per-format.


Extending — custom sanitizer filters

Insert your own filter between Page Builder and HTML stripping (e.g. to
remove <script> data attributes, redact phone numbers, etc.) by implementing
Angeo\LlmsTxt\Api\SanitizerFilterInterface and re-declaring the pipeline
in di.xml.

<type name="Angeo\LlmsTxt\Model\Sanitizer\Sanitizer">
    <arguments>
        <argument name="filters" xsi:type="array">
            <item name="cms_directive" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\CmsDirectiveFilter</item>
            <item name="page_builder"  xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\PageBuilderFilter</item>
            <item name="redact_pii"    xsi:type="object">Vendor\Module\Sanitizer\Filter\PiiRedactionFilter</item>
            <item name="html"          xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\HtmlFilter</item>
            <item name="whitespace"    xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\WhitespaceFilter</item>
        </argument>
    </arguments>
</type>

Events

Hook in via observers — three events are dispatched per store/format pass:

Event Data
angeo_llms_generation_before store, format, context
angeo_llms_generation_after store, format, file, bytes, items, duration
angeo_llms_generation_failed store, format, error

Migrating from 2.x

  • Old files in media/llms/ can be deleted (output now lives in media/angeo/llms/).
  • Any custom ProviderInterface implementations must change from returning a string to yielding iterable<string>. See Extending — custom providers.
  • Drop any reverse-proxy / Nginx rewrites pointing at the old paths.
  • Re-run Stores → Configuration → Angeo → LLMs.txt to set the new fields (Page Builder strategy, customer group, etc.).
  • External tooling that called the GET /admin/angeo_llms/generate/index URL must switch to the CLI command (the admin endpoint is now POST + CSRF).

License

MIT — see LICENSE.

Support

Changelog

All notable changes to Angeo_LlmsTxt are documented in this file.

The format follows Keep a Changelog,
and this project adheres to Semantic Versioning.


[3.0.5] — 2026-06-04

Admin-config bugfix. Safe drop-in upgrade from 3.0.x.

Fixed

  • System Config "Save Config" no longer throws Cannot read properties of undefined (reading 'settings'). The Generate button frontend_model
    template (generate_button.phtml) rendered two <form> elements inside
    the admin system-config form (#config-edit-form). Nested forms are invalid
    HTML: the browser re-parents the inner inputs/buttons onto the outer form, so
    on Save the jQuery validator (jquery.validate.js metadataRules) iterated an
    orphaned submit button that has no rule metadata and crashed, aborting the
    whole submit. The buttons are now plain type="button" elements that POST via
    a JS-built form appended to <body> (outside the config form). CSRF
    protection is unchanged — the form key is still submitted.

Install-blocking bugfix plus PHP 8.5 support. Safe drop-in upgrade from 3.0.x.

Fixed

  • setup:upgrade no longer fails XSD validation on etc/adminhtml/system.xml.
    Two <comment> elements (cache_ttl_seconds and schedule) contained raw
    <code> HTML without a CDATA wrapper. system_file.xsd only allows a model
    child inside <comment>, so the literal markup tripped
    Element 'code': This element is not expected. Expected is ( model ) and
    aborted module loading. Both comments are now wrapped in <![CDATA[ … ]]>,
    matching every other HTML-bearing comment in the file.

Changed

  • Added PHP 8.5 to the supported range (…||~8.5.0). Intended for Magento
    2.4.9+, which is the first line to support PHP 8.5; on 2.4.8 and earlier,
    PHP 8.4 remains the recommended runtime.

Admin-config bugfix. No functional or API changes — safe drop-in upgrade
from 3.0.x.

Fixed

  • System Config "Save Config" no longer throws a JS TypeError. Three
    numeric fields in etc/adminhtml/system.xml declared validation classes
    that are not registered in Magento's mage/validation ruleset
    (validate-greater-than-zero and integer). On 2.4.8-p4 the admin form
    validator (jquery.validate.js metadataRules) looks up
    settings on each rule object; the missing rules resolved to undefined,
    producing Cannot read properties of undefined (reading 'settings') and
    aborting the entire form submit. Replaced with registered rules:
    • collection_page_size: → validate-digits validate-digits-range digits-range-0-1000000
    • product_limit: → validate-digits
    • cache_ttl_seconds: → validate-digits

[3.0.4] — 2026-06-03

Compatibility patch. No functional or API changes — safe drop-in upgrade
from 3.0.x.

Changed

  • Lowered the minimum PHP to 8.1 (~8.1.0||~8.2.0||~8.3.0||~8.4.0).
    The module uses no PHP 8.2+ only syntax, so it runs on 2.4.5 / 2.4.6 stores
    that are still on PHP 8.1 as well as on 2.4.7 / 2.4.8 (PHP 8.3 / 8.4).
  • Broadened dependency constraints to cover 2.4.5 through 2.4.8. Every
    Magento dependency in require now uses an open lower-bound (>=) pinned to
    the major line that shipped with 2.4.5 — e.g. magento/framework: >=102.0
    and magento/module-url-rewrite: >=102.0. Because these major lines do not
    change between 2.4.5 and 2.4.8, the module installs cleanly across all of
    those minors. This replaces the earlier exact carets (such as the ^101.2
    on module-url-rewrite) that failed on 2.4.8, where that module ships as
    102.x.

[3.0.2] — 2026-06-03

Marketplace-readiness patch. No functional or API changes — safe drop-in
upgrade from 3.0.0.

Fixed

  • Replaced md5() with hash('sha256', …) for ETag generation in the
    file-serving controller. The Magento Coding Standard forbids md5(); the
    ETag only needs to be stable and unique, so the switch is behaviour-neutral.
  • Removed error-silencing @ operators from filesystem calls
    (fopen / flock / fclose) in the atomic-write lock helper and in the
    validate command. Return values were already checked explicitly, so
    dropping @ changes no behaviour while clearing the coding-standard errors.

Changed

  • Dependency constraints pinned to real 2.4.x major lines. require now
    uses caret ranges matching the actual published modules — notably
    magento/module-url-rewrite: ^102.0 (the 101.2 line never existed). This
    resolves a composer require failure on clean 2.4.8 installs.
  • Added an explicit version field (3.0.1) to composer.json so the
    package version matches the Marketplace submission form.

[3.0.0] — 2026-05-23

A full rebuild against the architectural review of 2.1.4. This release is
not drop-in compatible — see the Breaking Changes section below for
migration steps.

Breaking changes

  • ProviderInterface::provide() signature changed from string to
    iterable<string>. Custom providers contributed by third-party modules
    must now yield chunks rather than return one concatenated string. This is
    the change that lets the generator stream to disk with bounded memory.
  • /llms-full.txt now serves a genuinely-different file (full sanitized
    descriptions inline). Previously, this URL silently aliased to /llms.txt,
    which was misleading.
  • llms.txt header is now spec-compliant. A single blockquote summary line,
    with currency / locale / base-URL moved to a plain markdown paragraph below.
    The 2.x output used four blockquote lines, which broke llmstxt.org-spec
    parsers.
  • Status tracking moved out of core_config_data and into
    var/angeo_llms/status.json. Old status rows under angeo_llms/status/*
    are no longer read. Drop them via bin/magento config:set --lock-env angeo_llms/status/... "" if you want a clean state, but it's harmless to leave them.
  • media/llms/ is no longer used as the file output directory; output now
    lives under media/angeo/llms/. Old files can be deleted; remove any reverse-proxy rewrites pointing at the old path.
  • Admin "Generate" action moved to POST + CSRF. If you have any external
    tooling that hit the old GET URL, switch to the CLI command instead.
  • Module namespace unchanged: still Angeo\LlmsTxt. Composer package
    name unchanged.

Added

  • Page Builder element filter with four strategies — preserve, exclude,
    allow, strip — driven by the element's data-content-type attribute.
    Default list of excluded types drops common visual-only elements
    (products carousel, banner, slider, video, map, buttons, block,
    dynamic-block, divider, spacer) so the output focuses on semantic text.
    Configurable per-store at Stores → Configuration → Angeo → LLMs.txt →
    Content Sanitization
    .
  • Streaming generation via PHP generators. Memory stays bounded at one
    collection page (default 1000 products) regardless of catalog size.
  • Atomic writes: each file is written to .tmp, then renamed. Readers
    never see a half-written file. Generation locks via a separate .lock file
    with flock(LOCK_EX | LOCK_NB), so concurrent runs cannot corrupt output.
  • Cursor pagination by entity_id ASC > $lastId instead of skip/limit, so
    products inserted mid-run can neither be duplicated nor skipped.
  • Batch URL resolver loads every URL rewrite for a store in one query
    (vs. the per-product getProductUrl() query that 2.x triggered N times).
  • Real llms-full.txt with full sanitized descriptions inline.
  • /{url_key}.md mirrors — every product, category, and CMS page exposes
    a clean Markdown rendering at its URL with .md appended. Generated on the
    fly; no extra disk storage.
  • CMS directive resolution{{widget}}, {{block}}, {{var}}, and
    {{store}} directives are now rendered via Magento's standard frontend
    filter before being stripped, instead of leaking as literal text.
  • Customer-group-aware pricing — admin can choose which customer group's
    final price (with special-price and group-price applied) gets exposed.
  • HTTP cachingETag, Last-Modified, Cache-Control: public, max-age=,
    X-Robots-Tag: noindex, follow, and 304 responses on conditional GETs.
  • Async admin actionSchedule (Async) inserts a cron_schedule row for
    the next tick so admins don't have to wait through a synchronous generation.
  • Live admin status panel polling /angeo_llms/status/index every 60s.
  • Three CLI commands:
    • bin/magento angeo:llms:generate [--store=…] [--no-jsonl] [--no-llms] [--no-full]
    • bin/magento angeo:llms:status
    • bin/magento angeo:llms:validate [--store=…]
  • JSONL JSON-Schema at etc/jsonl-schema.json for downstream pipelines.
  • Events: angeo_llms_generation_before, angeo_llms_generation_after,
    angeo_llms_generation_failed — for custom hooks.
  • PHPUnit test suite under Test/Unit/.

Changed

  • frontend_default_meta_description is now the fallback for the store
    summary, before falling back to the generic stub.
  • Multi-store store-code routing handles the last URL path segment, so
    /de/llms.txt works on path-based stores.
  • Spec compliance: products go under ## Optional by default (admin
    toggleable) so context-budget-constrained clients can drop them.
  • Out-of-stock products excluded by an explicit StockRegistry lookup
    (configurable).
  • Logger context is now structured: every log line is prefixed
    [Angeo LlmsTxt] and includes store/format keys.

Fixed

  • Pseudo-locking in 2.x: a 'w' open truncates the file before the
    flock() call, so two concurrent generations both saw an empty file and
    the last writer won unpredictably. 3.0 uses a separate .lock file.
  • CSRF-exposed admin generate: 2.x used a GET URL; 3.0 requires POST with
    the form key.
  • Synchronous admin "Generate" timing out on large catalogs (now async option).
  • N+1 URL rewrite queries: now batched.
  • Literal {{widget}} text appearing in 2.x output: now resolved.
  • Stale files for stores that became inactive or excluded: now cleaned up
    on every generation run.

Removed

  • media/llms/ legacy directory (see breaking-changes notes).
  • GET endpoint for admin generation.
  • Documented-but-non-existent config fields from 2.x README.

[2.1.4] — Pre-rebuild baseline

Last release in the 2.x line. See the architectural review document for
the issues that motivated 3.0.0.

Versions
Version Stability QA Status Released
3.0.5 stable Fail 2026-06-04 19:39:51
3.0.4 stable Not tested 2026-06-03 18:23:19
3.0.3 stable Not tested 2026-06-03 18:04:49
3.0.2 stable Not tested 2026-06-03 17:46:25
3.0.1 stable Not tested 2026-06-03 16:17:59
3.0.0 stable Not tested 2026-05-29 20:31:58
2.1.4 stable Not tested 2026-05-06 04:36:21
2.1.3 stable Not tested 2026-04-30 07:41:35
2.1.2 stable Not tested 2026-04-30 05:05:25
2.1.1 stable Not tested 2026-04-29 20:38:27
2.1.0 stable Not tested 2026-04-29 20:07:40
2.0.0 stable Not tested 2026-04-16 18:52:27
1.1.2 stable Not tested 2026-03-20 18:38:35
1.1.1 stable Not tested 2026-03-18 18:33:44

Requires 12

Package Constraint
ext-json *
ext-mbstring *
magento/framework >=102.0
magento/module-backend >=102.0
magento/module-catalog >=104.0
magento/module-catalog-inventory >=100.4
magento/module-catalog-url-rewrite >=100.4
magento/module-cms >=104.0
magento/module-config >=101.2
magento/module-store >=101.0
magento/module-url-rewrite >=102.0
php ~8.1.0||~8.2.0||~8.3.0||~8.4.0||~8.5.0

Requires-dev 3

Package Constraint
magento/magento-coding-standard ^32.0
phpstan/phpstan ^1.10
phpunit/phpunit ^10.5

Suggests 2

Package Reason
magento/module-page-builder Enable to opt-in or opt-out of Page Builder content elements per content-type during sanitization
magento/module-shared-catalog Adobe Commerce: integrate B2B shared catalogs so llms.txt only exposes the allowed catalog
QA results
Tool Status Findings Summary
PHPCS Pass 0
PHPStan Fail 27 27 errors (level 4, ruleset: phpstan + bitexpert/phpstan-magento)
Cpd Fail 3 3 duplicated chunks spanning 81 total lines (min-lines=5, min-tokens=70)
Security Pass 0
License
MIT
Homepage
https://angeo.dev/
Authors
Make it pay

Turn an existing module into recurring revenue.

If you already maintain a Magento 2 module on GitHub or GitLab, listing it on Packagento takes about five minutes. We mirror your tags, handle distribution signing, and route paid licenses through Stripe Connect, so you can keep shipping the way you already do.