# External Scanner Examples These are **optional example implementations** of language-specific scanners that demonstrate the Lex scanner plugin architecture. They are provided as reference implementations for users who need multi-language code scanning. ## ⚠️ Security Notice **THESE ARE EXAMPLES, NOT PRODUCTION-HARDENED TOOLS** - **Not audited for security vulnerabilities** - Use at your own risk - **Execute code analysis on your filesystem** - Ensure you trust the scanned directories - **No sandboxing** - Scanners run with your full user permissions - **No input validation** - Do not run on untrusted codebases without thorough review - **Not part of core Lex security audits** - External scanner security is user's responsibility ### Safe Usage Guidelines 1. **Only scan code you control or explicitly trust** 2. **Review scanner source before running** (both scanners are < 400 lines - readable in 15 minutes) 3. **Run in isolated environments for untrusted code** (Docker containers, VMs, or sandboxed CI/CD) 4. **Validate scanner output** before feeding to `lex check` or policy enforcement 5. **Keep Python runtime updated** to avoid vulnerabilities in the interpreter itself ### For Production Scanning Pipelines If you plan to use these scanners in production: - **Audit the scanner code thoroughly** for your security requirements - **Run in sandboxed CI/CD environments** with minimal permissions - **Implement output validation and sanitization** before policy checks - **Consider commercial AST analysis tools** with security SLAs and vendor support - **Monitor scanner execution** for unexpected behavior or resource usage ## ⚠️ Important Notice - **These scanners are NOT part of the core Lex library** - **They require external runtimes:** Python 3.7+ and/or PHP 7.4+ must be installed on your PATH - **They are not executed by `lex` CLI commands** - the `lex check` command consumes pre-generated scanner JSON output, not the scanners themselves - **The TypeScript scanner (`src/policy/scanners/ts_scanner.ts`) is the primary, officially supported scanner** ## Available Scanners ### Python Scanner (`python_scanner.py`) Scans Python codebases and extracts architectural facts. **Requirements:** Python 3.7+ (no external dependencies) ```bash # Without policy (basic scanning) python3 python_scanner.py > output.json # With policy (includes module_scope and module_edges) python3 python_scanner.py policy.json > output.json ``` ### PHP Scanner (`php_scanner.py`) Scans PHP codebases and extracts architectural facts. **Requirements:** Python 3.7+ (scanner is written in Python, scans PHP code) ```bash # Without policy python3 php_scanner.py > output.json # With policy python3 php_scanner.py policy.json > output.json ``` ## Scanner Plugin Architecture All scanners follow the same design philosophy: **"dumb by design"**. ### What Scanners Do - Observe code and report facts - Extract: classes, functions, imports, declarations - Detect: feature flags, permission checks - Map: files to modules (when policy provided) - Identify: cross-module boundaries ### What Scanners DON'T Do - Make architectural decisions - Enforce policies - Decide if imports are allowed - Determine if boundaries are violated **Policy enforcement happens separately** in the `policy/check/` subsystem, which compares scanner output against `lexmap.policy.json`. ## Output Format All scanners produce JSON conforming to the same schema: ```json { "language": "python", "files": [ { "path": "services/auth/service.py", "module_scope": "services/auth", "declarations": [ { "type": "class", "name": "AuthService" } ], "imports": [ { "from": "services.user.models", "type": "import_statement", "imported": ["User"] } ], "feature_flags": ["new_auth_flow"], "permissions": ["can_authenticate"], "warnings": [] } ], "module_edges": [ { "from_module": "services/auth", "to_module": "services/user", "from_file": "services/auth/service.py", "import_statement": "services.user.models" } ] } ``` ## Integration Workflow These external scanners fit into the Lex policy pipeline as **optional first-stage processors**: ```bash # 1. Scan each language (only if you need multi-language support) npx tsx src/policy/scanners/ts_scanner.ts src/ policy.json > ts.json python3 examples/scanners/python_scanner.py backend/ policy.json > py.json python3 examples/scanners/php_scanner.py api/ policy.json > php.json # 2. Merge scanner outputs node src/policy/merge/lexmap-merge.ts ts.json py.json php.json > merged.json # 3. Check against policy using lex CLI lex check merged.json policy.json ``` **For TypeScript-only projects:** You only need the first-party TypeScript scanner from `src/policy/scanners/ts_scanner.ts`. ## Feature Detection Patterns ### Feature Flags **Python:** - `feature_flags.is_enabled('flag_name')` - `FeatureFlags.enabled('flag_name')` - `settings.FEATURES['flag_name']` **PHP:** - `FeatureFlags::enabled('flag_name')` - `$featureFlags->isEnabled('flag_name')` - `config('features.flag_name')` ### Permission Checks **Python:** - `user.has_perm('permission_name')` - `check_permission('permission_name')` - `@permission_required('permission_name')` **PHP:** - `$user->can('permission_name')` - `Gate::allows('permission_name')` - `$this->authorize('permission_name')` ## Testing These scanners have integration tests in `src/policy/scanners/test_scanners.ts` that are **conditionally executed** based on environment flags. To run the external scanner integration tests: ```bash # Enable external scanner tests (requires Python 3.7+) LEX_ENABLE_EXTERNAL_SCANNER_TESTS=1 npm test ``` Without the flag, these tests are skipped and `npm test` passes with Node.js alone. ## Module Resolution When you provide a `policy.json` file, scanners: 1. Load the policy's `modules` definitions 2. Match file paths against `owns_paths` glob patterns 3. Set `module_scope` for each file 4. Detect cross-module imports and populate `module_edges` Example policy excerpt: ```json { "modules": { "services/auth": { "owns_paths": ["services/auth/**"], "owns_namespaces": ["App\\Services\\Auth"] }, "services/user": { "owns_paths": ["services/user/**"] } } } ``` ## Creating Your Own Scanner To implement a scanner for another language: 1. **Choose your implementation language** (Python, TypeScript, Ruby, etc.) 2. **Follow the output schema** (see Output Format above) 3. **Reuse pattern detection logic** from existing scanners 4. **Keep it simple** - scanners observe, they don't enforce 5. **Test** with the integration test framework (optional) 6. **Document** runtime requirements clearly The scanner architecture is intentionally minimal to support community contributions. ## Philosophy > **Scanners are dumb by design.** This architectural principle keeps language-specific logic isolated and simple. Complex policy decisions happen in the language-agnostic `policy/check/` subsystem, ensuring consistent enforcement across all scanned languages. ## Support Status | Scanner | Status | Maintenance | Runtime Required | |---------|--------|-------------|------------------| | TypeScript (`src/policy/scanners/ts_scanner.ts`) | ✅ **Officially Supported** | Active | Node.js only | | Python (`examples/scanners/python_scanner.py`) | 📦 **Example** | Community | Python 3.7+ | | PHP (`examples/scanners/php_scanner.py`) | 📦 **Example** | Community | Python 3.7+ | **For production use:** We recommend the TypeScript scanner, which is actively maintained and requires only Node.js.