=== Static Cache Wrangler - Headless Assistant ===
Contributors: derickschaefer
Tags: headless, cms, sanity, converter, exporter
Requires at least: 6.0
Tested up to: 6.9
Requires PHP: 7.4
Stable tag: 2.1.0
License: GPLv2 or later
License URI: https://www.gnu.org/licenses/gpl-2.0.html
Requires Plugins: static-cache-wrangler

Convert Static Cache Wrangler HTML output to headless CMS import formats with pluggable architecture.

== Description ==

**Static Cache Wrangler - Headless Assistant** is a companion plugin for [Static Cache Wrangler](https://wordpress.org/plugins/static-cache-wrangler/) that converts cached HTML files into headless CMS-compatible formats for modern headless CMS workflows.  This plugin requires WP-CLI and is intended for developers and administrators who have good working knowledge of operating in shell environments with traditional Linux stream processing commands (e.g. sed, grep, awk, sort) and a willingness to explore the WordPress command-line interface.  This plugin enables composable, command-line interface tooling and IS NOT a point and click solution.

= Testing Results =

Tested on cachewrangler.com (15-page WordPress site):

* ☑ **74% semantic conversion rate** - 564 blocks converted to structured content
* ☑ **15 pages converted** successfully to Sanity format
* ☑ **763 links preserved** with proper structure and references
* ☑ **36 images tracked** with migration metadata
* ☑ **14 accordions converted** semantically
* ☑ **13 tables converted** semantically

**Top Conversion Rates Achieved:**
* Simple pages: 86% semantic conversion
* Complex pages with mixed content: 74-82%
* Ultimate test page (141 blocks, 23 different Kadence block types): 52%

**What Falls Back to HTML (requires custom hooks):**
* Navigation menus (by design - preserves styling)
* Advanced Kadence blocks (countdown, forms, testimonials, maps, etc.)
* Premium block libraries (Otter, Spectra - requires additional detectors)

= Extensible Architecture =

Unlike hardcoded solutions, this plugin uses a **pluggable engine system** where CMS targets can be registered via filters. Ships with Sanity® CMS support (unofficial) out of the box.  It is technically feasible to target Contentful, Strapi, and others via extensions.

*Sanity® is a registered trademark of Sanity.io. This project is not affiliated with or endorsed by Sanity.io.*

= Features =

**Direct Sanity CMS Conversion:**
* WordPress → Sanity NDJSON export  
* Pattern detection for Gutenberg blocks  
* Schema generation for Sanity Studio  
* Asset tracking and manifest  
* Command: `wp scw-headless convert --cms=sanity`

**Smart Pattern Detection:**
* 12 core Gutenberg patterns  
* 28 Kadence Blocks patterns  
* XPath-based detection with confidence scoring  
* Priority-based matching for nested structures  
* Pattern inheritance system

**CLI-First Experience:**
* `wp scw-headless scan` - View cached files  
* `wp scw-headless analyze <file>` - Detect patterns  
* `wp scw-headless convert --cms=sanity` - Export to Sanity  
* `wp scw-headless patterns` - List registered patterns  
* `wp scw-headless detectors` - Show detector modules  
* `wp scw-headless targets` - List available CMS platforms  
* `wp scw-headless info` - Show plugin statistics

= Future Roadmap Considerations =

**Generic Portable Text Output:**
* CMS-agnostic JSON format  
* Support for any Portable Text consumer  
* Command: `wp scw-headless normalize`  
* Non-WP analyzer and converter tooling

**Advanced Pattern Detection:**
* 40+ patterns including Kadence Blocks  
* ACF field support (roadmap)  
* Page builder compatibility (roadmap)  
* Custom pattern registration

**Multi-CMS Support:**
* Sanity (today)  
* Strapi (horizon)  
* Contentful (horizon)  
* Payload CMS (horizon)  
* Any Portable Text consumer (roadmap)

[Learn more about planned features features](https://wp2headless.com/)

**Funding Model**
* This plugin is 100% free (true WordPress style)
* Want to make a donation? Consider purchasing a copy of the author's book on command-line interfaces for yourself or as a gift.

[Modern CLI Book](https://moderncli.dev)

= Pattern Detection System =

**Built-in Detector Modules:**

**Gutenberg Core** - 12 patterns:
* heading, paragraph, image, gallery, video
* list (ordered/unordered), quote, code
* button, buttons, separator, table

**Kadence Blocks** - 28 patterns:
* accordion, tabs, advanced_button, progress_bar
* icon_list, infobox, countdown, rowlayout
* column, advanced_heading, form, testimonials
* posts, table_of_contents, google_maps, lottie
* image, video_popup, advanced_gallery, navigation
* icon, spacer, show_more, search, identity
* table, vector, countup

**Extensible via Filters:**
```php
// Register custom patterns
add_action('stcw_headless_patterns_loaded', function() {
    \STCW\Headless\Engine\Detector\PatternRegistry::register('custom_block', [
        'selectors' => ['.my-custom-block'],
        'extractor' => [MyExtractor::class, 'extract'],
        'priority' => 8,
        'confidence' => 0.95,
    ]);
});
```

= How It Works =

1. **Cache your WordPress site** with Static Cache Wrangler  
2. **Scan cached files:** `wp scw-headless scan`  
3. **Analyze patterns:** `wp scw-headless analyze /page/`  
4. **Convert:**
   - Free: `wp scw-headless convert --cms=sanity`

= Supported CMS Targets =

**Included:**
* **Sanity CMS** - Full Portable Text conversion with schema generation

= Perfect For =

* Migrating WordPress content to headless CMS platforms  
* JAMstack architecture with WordPress as authoring tool  
* SEO component analysis
* UI pattern analysis

= Requirements =

* WordPress 6.0 or higher  
* PHP 7.4+ (PHP 8.x fully supported)  
* Static Cache Wrangler 2.0.5+ (must be installed and active)  
* WP-CLI recommended for best experience  
* Pattern Library Pro for enterprise features (optional)

== Installation ==

= Automatic Installation =

1. Install and activate [Static Cache Wrangler](https://wordpress.org/plugins/static-cache-wrangler/)
2. Install (or confirm installation) [WP-CLI](https://wp-cli.org/)
3. Search "STCW Headless Assistant" in WordPress plugin directory
4. Click "Install Now" and activate
5. Generate cached pages by browsing your site
6. Use WP-CLI: `wp scw-headless info` to verify installation

= Manual Installation =

1. Install and activate Static Cache Wrangler first
2. Upload `stcw-headless-assistant` folder to `/wp-content/plugins/`
3. Activate via WordPress admin
4. Navigate to **Static Cache > Headless Assistant**
5. Run: `wp scw-headless scan` to verify setup

= Recommended Setup =

1. Install WP-CLI (for CLI support)
2. Install Static Cache Wrangler (for HTML caching)
3. Install STCW Headless Assistant (this plugin)
4. Cache your site: `wp scw enable`
5. Test conversion: `wp scw-headless analyze /`

== Frequently Asked Questions ==

= Does this work without Static Cache Wrangler? =

No, this is a companion plugin that requires Static Cache Wrangler to be installed and active. It converts the HTML files that Static Cache Wrangler generates.

= What WP-CLI commands are available? =

**Commands:**
* `wp scw-headless info` - Show system status and statistics
* `wp scw-headless scan` - List all cached HTML files ready for conversion
* `wp scw-headless analyze <file>` - Detect patterns in specific file
* `wp scw-headless patterns` - List all registered patterns with confidence scores
* `wp scw-headless detectors` - Show registered detector modules
* `wp scw-headless convert --cms=sanity` - Convert all files to Sanity format
* `wp scw-headless targets` - List available CMS targets

All commands support `--format=json` for automation.

= What gets exported? =

**Current (Sanity conversion):**
The plugin creates a complete export package containing:

* `data.ndjson` - Sanity import data in newline-delimited JSON format
* `asset-manifest.json` - Asset references with URLs and metadata
* `schemas/` - Sanity Studio schema definitions
* `README.md` - Import instructions for Sanity
* `.zip` archive - Complete package for download

Export packages are saved in `wp-content/cache/stcw-headless-exports/` .

= How does pattern detection work? =

The plugin uses a sophisticated multi-phase pipeline:

1. **HTML Normalization** - Strips WordPress-specific classes, IDs, and attributes while preserving semantic HTML
2. **Pattern Detection** - Uses XPath queries to find registered patterns (CSS selectors converted to XPath)
3. **Priority Sorting** - Processes patterns by priority (10=highest) to handle nested blocks correctly
4. **Confidence Scoring** - Each match includes confidence score (0.0-1.0) based on selector specificity
5. **Content Extraction** - Registered extractor functions parse matched DOM nodes into structured data
6. **Conversion** - Transform to target CMS format

= How accurate is pattern detection? =

**Production Results (cachewrangler.com):**
- Overall semantic conversion: 74%
- Core Gutenberg blocks: 100% accuracy (confidence: 1.00)
- Kadence Blocks: 90-95% accuracy (confidence: 0.90-0.98)
- 40 unique patterns registered
- 763 blocks processed across 15 pages
- 174 blocks fell back to raw HTML (navigation menus, advanced widgets)

**Per-Page Results:**
- Simple pages: 82-86% conversion
- Mixed content pages: 74-82% conversion  
- Complex pages (23 block types): 52% conversion

= What if a pattern isn't detected? =

Falls back to `rawHtml` block type with pattern metadata. You can:
- Add custom pattern definitions via filters
- Report missing patterns on GitHub
- Wire up existing extractors (many already exist)

= Can I add support for more blocks? =

Yes! Three ways:

**1. Register patterns via filter:**
```php
add_action('stcw_headless_patterns_loaded', function() {
    \STCW\Headless\Engine\Detector\PatternRegistry::register('my_block', [
        'selectors' => ['.my-block-class'],
        'extractor' => [MyExtractor::class, 'extract_my_block'],
        'priority' => 8,
        'confidence' => 0.95,
    ]);
});
```

**2. Create detector modules** (for larger block libraries)

**3. Use pattern inheritance:**
```php
// Extend existing patterns
PatternRegistry::register('custom_button', [
    'extends' => 'button',  // Inherits base button selectors
    'selectors' => ['.my-custom-button'],  // Adds custom selectors
]);
```

= Can I add support for other CMS platforms? =

Yes! The plugin is designed with a pluggable architecture:
```php
// Register custom CMS target
add_action('stcw_headless_register_targets', function() {
    $my_target = new My_CMS_Target();
    \STCW\Headless\Engine\Target\TargetRegistry::register($my_target);
});
```

Your target class must implement `TargetInterface` with methods for `convert()`, `generate_schemas()`, and `export()`.

= What paths are excluded from scans? =

By default, these paths are excluded:
* `assets/` - Static assets (CSS, JS, images)
* `author/` - Author archives
* `category/`, `tag/` - Taxonomy archives
* `index.php/` - WordPress quirks
* `feed/`, `wp-json/` - API endpoints
* `sitemap/`, `404/` - Utility pages
* Blog index pages (Posts Page in Settings → Reading)

Filter via `stcw_headless_excluded_paths` to customize.

= Why is the blog page not converting? =

WordPress "Posts Page" archives (the page set as your blog index in Settings → Reading) are intentionally skipped because they contain dynamic post loops, not static content. Individual blog posts are converted successfully.

To recreate your blog index in Sanity:
1. Import individual posts (automatically converted)
2. Use this GROQ query to fetch posts:
```groq
*[_type == "post"] | order(publishedAt desc) {
  title, slug, excerpt, publishedAt
}
```
3. Build your blog index view in your frontend

= Does this work with page builders? =

Kadence Blocks is currently supported.  Support is being considered for Elementor, Otter Blocks, Divi, and more.

= What's the performance? =

**v2.1.0 Benchmarks (cachewrangler.com test site):**
- 15 pages converted in ~6 seconds
- Small page (10 KB): ~0.2 seconds
- Medium page (50 KB): ~0.5 seconds  
- Large page (100 KB): ~1.0 seconds
- Batch (100 pages): ~45 seconds
- Pattern detection: XPath-based (efficient)
- Memory: ~20 MB per page

= Can I preview before converting? =

Yes! Use `wp scw-headless analyze <file>` to see:
- Patterns detected  
- Confidence scores  
- Asset references  
- Potential issues  
- Extraction preview

**Example:**
```bash
wp scw-headless analyze /plugins/kadence-blocks/ --verbose

=== Pattern Analysis ===
File: index.html (104 KB)
Patterns Found: 71

paragraph           20  Confidence: 1.00
heading             15  Confidence: 1.00
separator           14  Confidence: 1.00
kadence_button       6  Confidence: 0.90
kadence_accordion    2  Confidence: 0.95
...

Confidence Distribution:
  High (≥0.95):   60
  Medium (0.85+): 11
  Low (<0.85):     0
```

= How do I get support? =

* Free users: [GitHub Issues](https://github.com/derickschaefer/stcw-headless-assistant/issues)
* Documentation: [wp2headless.com/docs](https://wp2headless.com/documentation/)

= Why are some files showing as 140 B? =

Static Cache Wrangler may create gzipped files or use compression. The plugin handles this automatically by reading the actual `index.html` files within cached directories.

== Screenshots ==

1. WP-ADMIN View
2. WP-CLI Command and Example Output
3. Restuls from testing on https://cachewrangler.com

== Changelog ==

= 2.1.0 - January 15, 2025 =
**Major Update: Production-Ready after hours of testing**

* **Tested:** Full production testing on cachewrangler.com (15 pages, 763 blocks)
* **Confirmed:** 74% semantic conversion rate across mixed content
* **Confirmed:** 100% link preservation (763 links maintained)
* **Confirmed:** Zero data loss - all content captured
* **Added:** Generic Portable Text converter for CMS-agnostic output
* **Added:** Enterprise feature gating via `stcw_headless_is_enterprise` filter
* **Added:** `normalize` command for generic Portable Text export
* **Added:** Support for Pattern Library Pro integration
* **Added:** Asset tracking in generic format with IDs and metadata
* **Added:** Block type statistics in verbose mode
* **Added:** `--verbose` flag support for detailed output
* **Added:** `--output=<path>` flag to save JSON to file
* **Enhanced:** CLI commands with better error messages
* **Enhanced:** Pattern detection tested with 40 unique patterns
* **Enhanced:** JSON output structure with version, format, generator
* **Fixed:** List item handling (array vs string support)
* **Fixed:** Image deduplication per page
* **Fixed:** Parser API consistency (`parse_file()`)
* **Improved:** Export now generates Sanity-native NDJSON format
* **Improved:** Asset manifest with usage tracking and priority levels
* **Improved:** Accordion and table semantic conversion

= 2.0.9 - January 5, 2026 =
* Enhanced: Pattern detection confidence scoring
* Fixed: Slug deduplication prevents duplicate homepage exports
* Fixed: Parser file path resolution edge cases
* Improved: Normalizer statistics output formatting

= 2.0.8 - January 2, 2026 =
* Fixed: Improved file path resolution for all URL formats (`/contact/`, `contact`, `/`)
* Fixed: Scanner now properly excludes junk paths (`index.php/`, `author/admin/`)
* Enhanced: Better error messages showing attempted path resolutions
* Enhanced: Homepage `/` now resolves correctly to `index.html`
* Added: Filterable path exclusion list via `stcw_headless_excluded_paths`

= 2.0.7 - December 30, 2025 =
* Enhanced: CLI `info` command shows full parity with admin dashboard
* Added: Plugin version, directory paths, detector count, CMS targets to info output
* Added: Trademark symbol (®) for Sanity CMS throughout UI and CLI
* Enhanced: Better Scanner statistics with formatted file sizes
* Fixed: Admin dashboard cache size label clarity

= 2.0.6 - December 27, 2025 =
* Added: Complete Kadence Blocks support - 28 block patterns registered
* Added: Advanced Kadence extractors (accordion, tabs, progress_bar, icon_list, and 24 more)
* Enhanced: Pattern registry now supports pattern inheritance
* Enhanced: Confidence scoring system for better pattern matching
* Added: `wp scw-headless detectors` command to list detector modules

= 2.0.5 - December 26, 2025 =
* Added: Pattern detection system with XPath-based queries
* Added: HTML Normalizer with configurable cleanup strategies
* Added: `wp scw-headless analyze` command for pattern debugging
* Enhanced: CLI commands now support `--verbose` flag for detailed output

= 2.0.4 - December 22, 2025 =
* Added: Pluggable CMS target architecture with TargetRegistry
* Added: `wp scw-headless targets` command
* Enhanced: Convert command now uses `--cms=<target>` flag
* Refactored: Sanity-specific code moved to Target/Sanity/ namespace

= 2.0.3 - December 16, 2025 =
* Added: Admin dashboard with pattern statistics
* Added: Real-time cache file scanning
* Enhanced: WP-CLI output formatting with color codes

= 2.0.2 - December 1, 2025 =
* Enhanced: Pattern priority system for nested block handling
* Fixed: Pattern detection order respects priority values
* Added: Detection statistics with confidence distribution

= 2.0.1 - November 15, 2025 =
* Refactored: Complete namespace migration from `STCWSC_*` to `STCW\Headless\*`
* Added: PSR-4 autoloader for WordPress naming conventions
* Fixed: CLI namespace changed from `wp scw-sanity` to `wp scw-headless`
* Enhanced: Plugin renamed to "Static Cache Wrangler - Headless Assistant"

= 2.0.0 - November 1, 2025 =
* Major: Complete architectural refactor to pluggable system
* Breaking: CLI commands changed (backward compatibility via aliases)
* Breaking: Namespace changed to trademark-safe naming
* Added: Detector module system
* Added: Pattern registry with 12 Gutenberg patterns
* Added: HTML normalizer engine
* Enhanced: Sanity export generates complete schemas

= 1.0.0 - October 1, 2025 =
* Initial proof-of-concept release
* Basic Sanity conversion support
* Simple block detection
* CLI commands: info, scan, convert

== Upgrade Notice ==

= 2.1.0 =
Production-ready release with real-world testing on cachewrangler.com. 74% semantic conversion rate confirmed. Generic Portable Text converter added for multi-CMS workflows. Requires Pattern Library Pro for normalize command. Free tier (Sanity conversion) unchanged and fully functional.

= 2.0.8 =
Bug fixes for file path resolution and scanner filtering. Recommended update for all users.

= 2.0.0 =
Major architectural rewrite. Plugin renamed and CLI namespace changed. Old `wp scw-sanity` commands deprecated but work via aliases. Update to `wp scw-headless` commands.

== Additional Information ==

= Architecture =

**Engine Components:**
* **Scanner** - Finds cached HTML files  
* **Normalizer** - Cleans HTML while preserving structure  
* **Pattern Registry** - Centralized pattern definitions with inheritance  
* **Pattern Detector** - XPath-based pattern matching engine  
* **Extractors** - DOM-to-data conversion functions  
* **Parser** - Orchestrates normalization → detection → extraction  
* **Converter** - Transforms to target CMS format  
* **Target Registry** - Pluggable CMS target management

**Data Flow:**
```
Cached HTML → Normalizer → Pattern Detector → Extractors → Converter → Export
```

= Support =

* Documentation: https://wp2headless.com/documentation/
* GitHub: https://github.com/derickschaefer/stcw-headless-assistant
* Issues: https://github.com/derickschaefer/stcw-headless-assistant/issues

= Roadmap =

**v2.2.0 (Q1 2026) - Quality Focus**
* Improve semantic conversion to 85%+ (currently 74%)
* Wire up remaining Kadence extractors (button, icon list, maps)
* Enhance navigation menu handling
* Add pattern detection validators
* 80%+ test coverage

**v2.3.0 (Q2 2026) - More Patterns**
* Elementor widgets detection
* Beaver Builder modules
* ACF field mapping
* Custom post type support

**v2.5.0 (Q3 2026) - Multi-CMS**
* TBD based on input, feedback, and demand

= Contributing =

Contributions welcome!
* Additional CMS target implementations
* Page builder detector modules
* Pattern extraction improvements
* Documentation and examples
* Test coverage

[Contribution Guidelines](https://github.com/derickschaefer/stcw-headless-assistant/blob/main/CONTRIBUTING.md)

= Privacy Policy =

This plugin does not collect, store, or transmit any user data. All conversion happens locally on your WordPress installation.

**Data Storage:**
* No external API calls
* No analytics or tracking
* No cookies used
* Export files stored locally in WordPress uploads directory
* License validation (Pattern Library Pro) stored in wp_options

== License ==

GPL v2 or later - Copyright © 2024-2025 Derick Schaefer

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

== Third-Party Trademark Notices ==

**Content Management Systems:**
Sanity® is a registered trademark of Sanity.io  
Contentful® is a registered trademark of Contentful GmbH  
Strapi® is a registered trademark of Strapi Solutions SAS  
Builder.io® is a registered trademark of Builder.io, Inc.  
DatoCMS® is a registered trademark of Dato srl  
Payload CMS® is a registered trademark of Payload CMS, Inc.

**WordPress Ecosystem:**
WordPress® and Gutenberg® are registered trademarks of the WordPress Foundation  
WP-CLI® is a registered trademark of the WordPress Foundation  
Kadence® and Kadence Blocks™ are trademarks of Kadence WP LLC  
Elementor® is a registered trademark of Elementor Ltd.  
Divi® is a registered trademark of Elegant Themes, Inc.  
Advanced Custom Fields® (ACF) is a registered trademark of WP Engine, Inc.  
WooCommerce® is a registered trademark of Automattic Inc.  
Beaver Builder® is a registered trademark of Beaver Builder

**Development Tools:**
GitHub® is a registered trademark of GitHub, Inc.  
JSON™ is a trademark of JSON.org  

**Content & Media:**
YouTube® is a registered trademark of Google LLC  
Lottie™ is a trademark of Airbnb, Inc.  
Google Maps™ is a trademark of Google LLC  

**Project Attribution:**
WP2Headless.com is owned by Derick Schaefer  
Static Cache Wrangler is developed by Derick Schaefer  

**Disclaimer:**
This plugin is not affiliated with, endorsed by, or sponsored by any of the trademark owners listed above. These names are referenced solely to describe compatibility, integration capabilities, or as examples of headless CMS platforms that may be used with exported content. The plugin has not been tested, approved, or certified by any of these companies or organizations.
