angeo / module-llms-txt
angeo/module-llms-txt
Magento 2 module for AI Engine Optimization (AEO). Generates spec-compliant llms.txt and llms-full.txt per llmstxt.org standard, plus streaming JSONL for vector indexing. Multi-store, multi-website, CLI, cron, async admin UI, Page Builder-aware sanitization, customer-group pricing, atomic writes, ETag/Cache-Control, .md mirrors.
Angeo LLMs.txt — Magento 2 Module
AI Engine Optimization (AEO) for Magento 2 / Adobe Commerce. Generates
spec-compliant llms.txt, llms-full.txt, and JSONL files so ChatGPT,
Claude, Gemini, Perplexity, and other LLM-powered crawlers can ingest your
catalog efficiently.
What this module does
After install, your storefront serves:
| URL | What it is |
|---|---|
https://shop/llms.txt |
Spec-compliant llmstxt.org file (compact markdown) |
https://shop/llms-full.txt |
Same structure, full sanitized descriptions inline |
https://shop/llms.jsonl |
One JSON record per line, for vector indexing |
https://shop/{url-key}.md |
On-the-fly Markdown mirror of any product/category/CMS page |
Generation happens via cron (daily by default), CLI, or the admin "Generate
Now" button. The output is streamed to disk with bounded memory, atomically
renamed on completion, and served with proper ETag / Cache-Control headers.
Why this module exists
LLM crawlers can ingest a typical Magento storefront — full theme, JS, image
sprites, navigation chrome — but that's wasteful for everyone. The
llmstxt.org standard defines a clean text format
optimized for AI ingestion: stable links, structured headings, descriptions
in their natural prose form rather than buried in product cards.
This module produces that format for Magento, with care taken for the things
Magento makes hard: multi-store layout, Page Builder content, CMS directive
resolution, customer-group pricing, and very large catalogs.
Installation
composer require angeo/module-llms-txt:^3.0
bin/magento module:enable Angeo_LlmsTxt
bin/magento setup:upgrade
bin/magento setup:di:compile # only in production mode
bin/magento setup:static-content:deploy adminhtml # only in production mode
bin/magento cache:flush
Then generate your first batch:
bin/magento angeo:llms:generate
Visit https://your-store.tld/llms.txt.
Configuration reference
All settings live at Stores → Configuration → Angeo → LLMs.txt.
General
| Field | Default | Notes |
|---|---|---|
| Enable | Yes | Master switch. |
| Exclude This Scope | No | Available at website + store scope. Skips generation for this scope. |
| Store Summary | — | One-line summary used as the spec-compliant blockquote. If empty, falls back to Design → HTML Head → Default Description. |
Content
| Field | Default | Notes |
|---|---|---|
| Include Categories | Yes | |
| Include CMS Pages | Yes | |
| Include Products | Yes | |
Products under ## Optional |
Yes | Recommended. Lets context-budget-constrained AI clients drop products without losing categories / pages. |
| Product Limit | 5000 | 0 = unlimited. |
| Exclude Out-of-Stock Products | No | |
| CMS Identifiers to Exclude | no-route, enable-cookies, privacy-policy-cookie-restriction-mode |
Comma- or newline-separated. |
| Customer Group for Pricing | NOT LOGGED IN | Which group's final price (with special / group prices) is exposed. |
Output formats
| Field | Default | Notes |
|---|---|---|
| Generate llms.txt | Yes | |
| Generate llms-full.txt | No | 5–50× larger; enable only if you actually want it. |
| Generate JSONL | Yes | One record per line; embeds-ready. |
Serve /url-key.md Mirrors |
No | Per-entity Markdown rendering; on-the-fly, no disk. |
Content sanitization
| Field | Default | Notes |
|---|---|---|
| Resolve CMS Directives | Yes | Renders {{widget}}, {{block}}, {{var}} via Magento's frontend filter. |
| Page Builder Strategy | Exclude | See below. |
| Excluded Content-Types | products, banner, slider, slide, video, map, buttons, button-item, block, dynamic-block, divider, spacer |
Used under Exclude strategy. |
| Allowed Content-Types | text, heading, html, tabs, tab-item, row, column, column-group |
Used under Allow strategy. |
Page Builder strategies
| Strategy | Effect |
|---|---|
| Preserve | Keep all Page Builder content; only strip wrapper attributes. |
| Exclude | Drop elements whose data-content-type is in the excluded list. Default. |
| Allow | Drop everything EXCEPT data-content-type in the allowed list. |
| Strip | Drop ALL elements that carry a data-content-type attribute. |
The filter parses content with DOMDocument (not regex), so nested Page
Builder containers are handled correctly. Known content-types include:
row, column-group, column, tabs, tab-item, text, heading,
html, image, video, map, divider, spacer, buttons, button-item,
banner, slider, slide, products, block, dynamic-block.
Performance
| Field | Default | Notes |
|---|---|---|
| Collection Page Size | 1000 | Lower if hitting memory limits on shared hosting. |
HTTP caching
| Field | Default | Notes |
|---|---|---|
| Cache-Control TTL (s) | 3600 | Sent as public, max-age=… on the served files. |
Cron
| Field | Default | Notes |
|---|---|---|
| Cron Expression | 0 2 * * * |
Daily at 02:00 server time. |
CLI commands
# Generate everything for all eligible stores
bin/magento angeo:llms:generate
# Single store, skip JSONL
bin/magento angeo:llms:generate --store=default --no-jsonl
# Per-store/per-format last-run status
bin/magento angeo:llms:status
# Lint generated files for spec compliance
bin/magento angeo:llms:validate
Extending — custom providers
Drop a new section into llms.txt (e.g. a "Brands" list, a "Recent Posts"
section, etc.) by implementing Angeo\LlmsTxt\Api\ProviderInterface and
registering it via di.xml.
namespace Vendor\Module\Provider\Llms;
use Angeo\LlmsTxt\Api\OutputContextInterface;
use Angeo\LlmsTxt\Model\Provider\AbstractProvider;
class BrandsProvider extends AbstractProvider
{
public function provide(OutputContextInterface $context): iterable
{
yield "## Brands\n\n";
foreach ($this->brandRepo->getList($context->getStore()->getId()) as $brand) {
$label = $this->escapeMarkdown($brand->getName());
yield "- [{$label}]({$brand->getUrl()})\n";
}
yield "\n";
}
}
<!-- etc/di.xml -->
<type name="Angeo\LlmsTxt\Model\Generator\LlmsTxtGenerator">
<arguments>
<argument name="providers" xsi:type="array">
<item name="brands" xsi:type="object">Vendor\Module\Provider\Llms\BrandsProvider</item>
</argument>
</arguments>
</type>
The base class gives you escapeMarkdown(), encodeJsonl(), isJsonl(),
isFullTxt(), and isApplicable() overridable to opt out per-format.
Extending — custom sanitizer filters
Insert your own filter between Page Builder and HTML stripping (e.g. to
remove <script> data attributes, redact phone numbers, etc.) by implementing
Angeo\LlmsTxt\Api\SanitizerFilterInterface and re-declaring the pipeline
in di.xml.
<type name="Angeo\LlmsTxt\Model\Sanitizer\Sanitizer">
<arguments>
<argument name="filters" xsi:type="array">
<item name="cms_directive" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\CmsDirectiveFilter</item>
<item name="page_builder" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\PageBuilderFilter</item>
<item name="redact_pii" xsi:type="object">Vendor\Module\Sanitizer\Filter\PiiRedactionFilter</item>
<item name="html" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\HtmlFilter</item>
<item name="whitespace" xsi:type="object">Angeo\LlmsTxt\Model\Sanitizer\Filter\WhitespaceFilter</item>
</argument>
</arguments>
</type>
Events
Hook in via observers — three events are dispatched per store/format pass:
| Event | Data |
|---|---|
angeo_llms_generation_before |
store, format, context |
angeo_llms_generation_after |
store, format, file, bytes, items, duration |
angeo_llms_generation_failed |
store, format, error |
Migrating from 2.x
- Old files in
media/llms/can be deleted (output now lives inmedia/angeo/llms/). - Any custom
ProviderInterfaceimplementations must change from returning astringto yieldingiterable<string>. See Extending — custom providers. - Drop any reverse-proxy / Nginx rewrites pointing at the old paths.
- Re-run Stores → Configuration → Angeo → LLMs.txt to set the new fields (Page Builder strategy, customer group, etc.).
- External tooling that called the GET
/admin/angeo_llms/generate/indexURL must switch to the CLI command (the admin endpoint is now POST + CSRF).
License
MIT — see LICENSE.
Support
- GitHub Issues: https://github.com/angeo-dev/module-llms-txt/issues
- Email: [email protected]
Changelog
All notable changes to Angeo_LlmsTxt are documented in this file.
The format follows Keep a Changelog,
and this project adheres to Semantic Versioning.
[3.0.4] — 2026-06-03
Compatibility patch. No functional or API changes — safe drop-in upgrade
from 3.0.x.
Changed
- Lowered the minimum PHP to 8.1 (
~8.1.0||~8.2.0||~8.3.0||~8.4.0).
The module uses no PHP 8.2+ only syntax, so it runs on 2.4.5 / 2.4.6 stores
that are still on PHP 8.1 as well as on 2.4.7 / 2.4.8 (PHP 8.3 / 8.4). - Broadened dependency constraints to cover 2.4.5 through 2.4.8. Every
Magento dependency inrequirenow uses an open lower-bound (>=) pinned to
the major line that shipped with 2.4.5 — e.g.magento/framework: >=102.0
andmagento/module-url-rewrite: >=102.0. Because these major lines do not
change between 2.4.5 and 2.4.8, the module installs cleanly across all of
those minors. This replaces the earlier exact carets (such as the^101.2
onmodule-url-rewrite) that failed on 2.4.8, where that module ships as
102.x.
[3.0.2] — 2026-06-03
Marketplace-readiness patch. No functional or API changes — safe drop-in
upgrade from 3.0.0.
Fixed
- Replaced
md5()withhash('sha256', …)for ETag generation in the
file-serving controller. The Magento Coding Standard forbidsmd5(); the
ETag only needs to be stable and unique, so the switch is behaviour-neutral. - Removed error-silencing
@operators from filesystem calls
(fopen/flock/fclose) in the atomic-write lock helper and in the
validate command. Return values were already checked explicitly, so
dropping@changes no behaviour while clearing the coding-standard errors.
Changed
- Dependency constraints pinned to real 2.4.x major lines.
requirenow
uses caret ranges matching the actual published modules — notably
magento/module-url-rewrite: ^102.0(the 101.2 line never existed). This
resolves acomposer requirefailure on clean 2.4.8 installs. - Added an explicit
versionfield (3.0.1) tocomposer.jsonso the
package version matches the Marketplace submission form.
[3.0.0] — 2026-05-23
A full rebuild against the architectural review of 2.1.4. This release is
not drop-in compatible — see the Breaking Changes section below for
migration steps.
Breaking changes
ProviderInterface::provide()signature changed fromstringto
iterable<string>. Custom providers contributed by third-party modules
must now yield chunks rather than return one concatenated string. This is
the change that lets the generator stream to disk with bounded memory./llms-full.txtnow serves a genuinely-different file (full sanitized
descriptions inline). Previously, this URL silently aliased to/llms.txt,
which was misleading.- llms.txt header is now spec-compliant. A single blockquote summary line,
with currency / locale / base-URL moved to a plain markdown paragraph below.
The 2.x output used four blockquote lines, which broke llmstxt.org-spec
parsers. - Status tracking moved out of
core_config_dataand into
var/angeo_llms/status.json. Old status rows underangeo_llms/status/*
are no longer read. Drop them viabin/magento config:set --lock-env angeo_llms/status/... ""if you want a clean state, but it's harmless to leave them. media/llms/is no longer used as the file output directory; output now
lives undermedia/angeo/llms/. Old files can be deleted; remove any reverse-proxy rewrites pointing at the old path.- Admin "Generate" action moved to POST + CSRF. If you have any external
tooling that hit the old GET URL, switch to the CLI command instead. - Module namespace unchanged: still
Angeo\LlmsTxt. Composer package
name unchanged.
Added
- Page Builder element filter with four strategies — preserve, exclude,
allow, strip — driven by the element'sdata-content-typeattribute.
Default list of excluded types drops common visual-only elements
(products carousel, banner, slider, video, map, buttons, block,
dynamic-block, divider, spacer) so the output focuses on semantic text.
Configurable per-store at Stores → Configuration → Angeo → LLMs.txt →
Content Sanitization. - Streaming generation via PHP generators. Memory stays bounded at one
collection page (default 1000 products) regardless of catalog size. - Atomic writes: each file is written to
.tmp, then renamed. Readers
never see a half-written file. Generation locks via a separate.lockfile
withflock(LOCK_EX | LOCK_NB), so concurrent runs cannot corrupt output. - Cursor pagination by
entity_id ASC > $lastIdinstead of skip/limit, so
products inserted mid-run can neither be duplicated nor skipped. - Batch URL resolver loads every URL rewrite for a store in one query
(vs. the per-productgetProductUrl()query that 2.x triggered N times). - Real
llms-full.txtwith full sanitized descriptions inline. /{url_key}.mdmirrors — every product, category, and CMS page exposes
a clean Markdown rendering at its URL with.mdappended. Generated on the
fly; no extra disk storage.- CMS directive resolution —
{{widget}},{{block}},{{var}}, and
{{store}}directives are now rendered via Magento's standard frontend
filter before being stripped, instead of leaking as literal text. - Customer-group-aware pricing — admin can choose which customer group's
final price (with special-price and group-price applied) gets exposed. - HTTP caching —
ETag,Last-Modified,Cache-Control: public, max-age=,
X-Robots-Tag: noindex, follow, and 304 responses on conditional GETs. - Async admin action — Schedule (Async) inserts a
cron_schedulerow for
the next tick so admins don't have to wait through a synchronous generation. - Live admin status panel polling
/angeo_llms/status/indexevery 60s. - Three CLI commands:
bin/magento angeo:llms:generate [--store=…] [--no-jsonl] [--no-llms] [--no-full]bin/magento angeo:llms:statusbin/magento angeo:llms:validate [--store=…]
- JSONL JSON-Schema at
etc/jsonl-schema.jsonfor downstream pipelines. - Events:
angeo_llms_generation_before,angeo_llms_generation_after,
angeo_llms_generation_failed— for custom hooks. - PHPUnit test suite under
Test/Unit/.
Changed
frontend_default_meta_descriptionis now the fallback for the store
summary, before falling back to the generic stub.- Multi-store store-code routing handles the last URL path segment, so
/de/llms.txtworks on path-based stores. - Spec compliance: products go under
## Optionalby default (admin
toggleable) so context-budget-constrained clients can drop them. - Out-of-stock products excluded by an explicit
StockRegistrylookup
(configurable). - Logger context is now structured: every log line is prefixed
[Angeo LlmsTxt]and includes store/format keys.
Fixed
- Pseudo-locking in 2.x: a
'w'open truncates the file before the
flock()call, so two concurrent generations both saw an empty file and
the last writer won unpredictably. 3.0 uses a separate.lockfile. - CSRF-exposed admin generate: 2.x used a GET URL; 3.0 requires POST with
the form key. - Synchronous admin "Generate" timing out on large catalogs (now async option).
- N+1 URL rewrite queries: now batched.
- Literal
{{widget}}text appearing in 2.x output: now resolved. - Stale files for stores that became inactive or excluded: now cleaned up
on every generation run.
Removed
media/llms/legacy directory (see breaking-changes notes).- GET endpoint for admin generation.
- Documented-but-non-existent config fields from 2.x README.
[2.1.4] — Pre-rebuild baseline
Last release in the 2.x line. See the architectural review document for
the issues that motivated 3.0.0.
| Version | Stability | QA Status | Released |
|---|---|---|---|
| 3.0.5 | stable | Fail | 2026-06-04 19:39:51 |
| 3.0.4 | stable | Not tested | 2026-06-03 18:23:19 |
| 3.0.3 | stable | Not tested | 2026-06-03 18:04:49 |
| 3.0.2 | stable | Not tested | 2026-06-03 17:46:25 |
| 3.0.1 | stable | Not tested | 2026-06-03 16:17:59 |
| 3.0.0 | stable | Not tested | 2026-05-29 20:31:58 |
| 2.1.4 | stable | Not tested | 2026-05-06 04:36:21 |
| 2.1.3 | stable | Not tested | 2026-04-30 07:41:35 |
| 2.1.2 | stable | Not tested | 2026-04-30 05:05:25 |
| 2.1.1 | stable | Not tested | 2026-04-29 20:38:27 |
| 2.1.0 | stable | Not tested | 2026-04-29 20:07:40 |
| 2.0.0 | stable | Not tested | 2026-04-16 18:52:27 |
| 1.1.2 | stable | Not tested | 2026-03-20 18:38:35 |
| 1.1.1 | stable | Not tested | 2026-03-18 18:33:44 |
Requires 12
| Package | Constraint |
|---|---|
| ext-json | * |
| ext-mbstring | * |
| magento/framework | >=102.0 |
| magento/module-backend | >=102.0 |
| magento/module-catalog | >=104.0 |
| magento/module-catalog-inventory | >=100.4 |
| magento/module-catalog-url-rewrite | >=100.4 |
| magento/module-cms | >=104.0 |
| magento/module-config | >=101.2 |
| magento/module-store | >=101.0 |
| magento/module-url-rewrite | >=102.0 |
| php | ~8.1.0||~8.2.0||~8.3.0||~8.4.0||~8.5.0 |
Requires-dev 3
| Package | Constraint |
|---|---|
| magento/magento-coding-standard | ^32.0 |
| phpstan/phpstan | ^1.10 |
| phpunit/phpunit | ^10.5 |
Suggests 2
| Package | Reason |
|---|---|
| magento/module-page-builder | Enable to opt-in or opt-out of Page Builder content elements per content-type during sanitization |
| magento/module-shared-catalog | Adobe Commerce: integrate B2B shared catalogs so llms.txt only exposes the allowed catalog |
No QA results yet
QA pipelines haven't run for this version. Status appears here once the vendor publishes a tagged release that gets ingested.
Turn an existing module into recurring revenue.
If you already maintain a Magento 2 module on GitHub or GitLab, listing it on Packagento takes about five minutes. We mirror your tags, handle distribution signing, and route paid licenses through Stripe Connect, so you can keep shipping the way you already do.