ho-nl / magento2-reachdigital_categoryattributededuplication
ho-nl/magento2-reachdigital_categoryattributededuplication
ReachDigital_CategoryAttributeDeduplication
Removes duplicate store-specific category attribute values that are identical to global
(store_id=0) values.
Background
Q: Do you know of any known issue or bug in Magento 2.4.x where category attribute values are
written to store-scope (without this intentionally being done by an admin user)?A: Yes, this is a well-known and long-standing issue in Magento 2.
The main culprit: When an admin user edits a category while viewing a specific store scope (not
"All Store Views"), Magento tends to save all attribute values to that store scope, even if:
- The values weren't actually changed
- The "Use Default Value" checkbox is checked
This happens because the category save controller/processor doesn't properly filter out unchanged
values or respect theuse_defaultflags in certain scenarios. Over time, this leads to massive
accumulation of store-scoped rows that are identical to the global values.Common triggers:
- Simply opening a category in a store view and clicking Save
- Mass actions on categories
- Programmatic saves that don't explicitly set
store_id = 0- Import processes that don't handle scope correctly
Related GitHub issues:
This module provides automated cleanup of these duplicate values.
Features
- Deduplicates store-specific category attribute values that match global values
- Configurable triggers: after category save (observer) and/or via cron job
- Console command for manual execution
- Dry-run mode to preview changes without deleting data
- Verbose logging option for detailed audit trail
- Configurable attribute exclusion list
- Separate log file:
var/log/category_attribute_deduplication.log
Configuration
Navigate to Stores > Configuration > Reach Digital > Category Attribute Deduplication
| Setting | Description | Default |
|---|---|---|
| Enable After Category Save | Run deduplication automatically after each category save | No |
| Enable Cron Job | Run deduplication for all categories daily at 3:00 AM | No |
| Dry Run Mode | Only log what would be deleted, without actually deleting | Yes |
| Verbose Logging | Log detailed information about each duplicate found | No |
| Excluded Attribute Codes | Additional attributes to exclude (comma-separated) | (empty) |
Note: url_path is always excluded from deduplication regardless of configuration.
Console Command
The console command always runs in dry-run mode by default (ignores config setting).
# Dry-run for all categories (default)
bin/magento reach:category:deduplicate-attributes
# Actually delete duplicates
bin/magento reach:category:deduplicate-attributes --apply
# Process specific category (dry-run)
bin/magento reach:category:deduplicate-attributes --category-id=123
# Process specific category and apply changes
bin/magento reach:category:deduplicate-attributes --category-id=123 --apply
Logging
All actions are logged to var/log/category_attribute_deduplication.log.
Non-verbose mode: Logs entity IDs and attribute codes with counts.
[DRY-RUN] name: 5 duplicates in entity_ids: 10, 15, 20, 25, 30
Verbose mode: Logs detailed information about each duplicate.
[DRY-RUN] Duplicate: entity_id=10, store_id=1, attribute=name, value_id=123, ...
Installation
bin/magento module:enable ReachDigital_CategoryAttributeDeduplication
bin/magento setup:upgrade
bin/magento cache:flush
Prompt
Implement a Magento extension under app/code that can deduplicate these store-specific values for
categories.
Some requirements:
- Needs to have a logger that writes everything it does to its own separate log file. Maybe a
verbose mode (behind config) that logs literally everything, and a shorter mode where only
entity_ids along with attribute codes are logged?
- Should exclude url_path attribute. If you know of any other attributes that should be excluded,
let me know.
- Needs to have a dry-run mode where it only logs what it /would/ do
- Needs to be triggered for a single category after saving (use the right event though, do we need
the commit after event?)
- Needs to be triggered for all categories from a cronjob and console command
- Add configs to enable or disable either cronjob and observer trigger (i.e. separate configs).
Leave both disabled by default.
Please include this prompt in the module README and keep it up to date as I provide more
instructions
---
Can you remove the exclusion of url_key? I think it should be okay to deduplicate this - if it's
the same at global and store level, then it must still remain consistent with the url_rewrite
table. If it's not, then this was already an issue, so I don't think deduplicating makes it worse
(or better).
---
Can you change the console command name so that is prefixed with reach-digital?
---
Adjust line-lengths in the README to break at 100 chars
---
Can you move the adminhtml configs to their own section and put them under the Reach Digital tab?
Don't forget to update default value paths and the constants in the code etc.
---
Can you change the console command prefix to `reach`? This is the one we most commonly use
---
Make the console command completely ignore the dry-run config, and make it use dry-run by default.
Invert the existing --dry-run option so can apply deduplication after checking with the default
dry-run mode
No changelog yet
The vendor hasn't published a changelog. Tagged releases appear in the Versions tab.
Requires 2
| Package | Constraint |
|---|---|
| php | >=8.1 |
| magento/module-catalog | * |
Compatibility
Each Magento release line is installed on its supported PHP versions, then the module is built (DI compilation + static-content deploy) and its unit and integration suites are run. The matrix shows the lines and PHP versions the module is confirmed to install and run on. Code-quality results further down (phpstan, phpcs, …) are reported separately and never affect compatibility.
Code Quality
Advisory checks against the module's source. Static analysis runs once across the whole module; PHPStan re-runs per Magento + PHP version because resolvable symbols differ between releases. These NEVER affect the Compatibility badge — a phpcs finding can't make a module incompatible.
Static analysis
Coding standards (phpcs), mess detection (phpmd), copy-pasted code (cpd), PHP cross-version compatibility, composer.json validity. Each runs once for the whole module.
PHPStan
Type-checks the module's PHP against a real Magento install at the configured gate level. Re-runs per Magento and PHP version because resolvable symbols differ between releases. Cell → details modal.
Tests
Unit and integration suites, run for each applicable Magento and PHP version. A test failure speaks to the module's behaviour, not its compatibility with a Magento line, so it is reported here separately and never reddens the compatibility matrix.
Unit tests
| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | N/A | N/A | ||
| 2.4.8 | N/A | N/A | ||
| 2.4.9 | N/A | N/A |
Integration tests
| Magento | PHP 8.2 | PHP 8.3 | PHP 8.4 | PHP 8.5 |
|---|---|---|---|---|
| 2.4.7 | N/A | N/A | ||
| 2.4.8 | N/A | N/A | ||
| 2.4.9 | N/A | N/A |
Security
Security checks run directly against the module: an audit of its declared dependencies for known vulnerabilities (composer audit) and a scan of its source for malware and web-shell signatures. Each runs once. A malware detection fails the version outright.
More from ho-nl
View vendorTurn an existing module into recurring revenue.
If you already maintain a Magento 2 module on GitHub or GitLab, listing it on Packagento takes about five minutes. We mirror your tags, handle distribution signing, and route paid licenses through Stripe Connect, so you can keep shipping the way you already do.