=== ScraperGuard - AI Scraper Blocker === Contributors: kneet Tags: ai, scraper, bots, user-agent, htaccess, security Requires at least: 5.8 Tested up to: 6.9 Requires PHP: 7.4 Stable tag: 1.0.0 License: GPLv2 or later License URI: https://www.gnu.org/licenses/gpl-2.0.html Block “good bots” (AI scrapers) by User-Agent. Optional Apache .htaccess rules and WordPress-level blocking with basic stats. == Description == ScraperGuard helps you block known AI scrapers (often called “good bots”) by matching their User-Agent string. You can: * Select specific bots to block, or block all known bots. * Add your own custom User-Agent substrings (one per line). * Block via Apache `.htaccess` (fast, before WordPress loads) **or** via WordPress-level blocking (can show basic stats). Important notes: * This plugin can block “good bots” that identify themselves. It cannot stop “bad bots” that ignore rules and/or spoof User-Agents. For that you may need additional security measures (WAF, rate limiting, bot protection). * `.htaccess` blocking works on Apache hosting only, and requires a writable `.htaccess` file. * WordPress-level blocking only affects requests that reach WordPress (it won’t block direct hits to static files unless they route through WordPress). * Country blocking (geo blocking) can use a country header (fast) or an IP lookup (works without Cloudflare but is slower). The settings page is under **Tools → ScraperGuard**. == External Services == This plugin can optionally connect to third-party IP geolocation services to determine the visitor's country for country-based blocking. This feature is **disabled by default** and only activates when you explicitly enable "Country blocking" in the settings. **When country blocking is enabled and the "Country detection method" is set to "Auto" or "IP lookup":** * **Service used**: The plugin uses either ipwho.is or ipapi.co (configurable in settings) * **Data sent**: The visitor's IP address is sent to the selected service * **When data is sent**: Only when a request is received and no country header is available from your server/proxy * **Purpose**: To determine the visitor's country code (ISO-2) for geo-blocking * **Caching**: Results are cached locally for 24 hours by default (configurable 1-168 hours) to minimize requests * **Privacy**: IP addresses are sent to external services. Ensure compliance with your privacy policy and local regulations. **ipwho.is (default provider):** * Service provider: ipwho.is * Privacy policy: https://ipwho.is/ * Terms of service: https://ipwho.is/ * No API key required **ipapi.co (alternative provider):** * Service provider: ipapi.co * Privacy policy: https://ipapi.co/privacy/ * Terms of service: https://ipapi.co/terms/ * No API key required for basic usage **Important**: If you keep the "Country detection method" set to "Header only" (the default), or if you don't enable country blocking at all, no data is sent to external services. == Installation == 1. Upload the folder `scraperguard` to your `/wp-content/plugins/` directory. 2. Activate the plugin in WordPress. 3. Go to **Tools → ScraperGuard**. 4. Choose which bots you want to block (or “block all known bots”). 5. Pick your blocking method: * Enable Apache `.htaccess` blocking (fastest) if your host supports it, and/or * Enable WordPress-level blocking (for basic stats). == Frequently Asked Questions == = Can I block all AI scrapers at once? = Yes. Enable “Block all known AI/archiver bots”. = Will this stop malicious bots? = Not reliably. “Bad bots” may ignore robots.txt and can spoof User-Agent strings. Consider a WAF, rate limiting, login protection, and server-side security rules. = Can I block whole countries? = Sometimes. The plugin supports optional country blocking at WordPress level. For best results, set “Country detection method” to “Auto” or “IP lookup”. Note that IP lookup needs outbound HTTPS from your server and may have privacy implications. = Does `.htaccess` mode work on every host? = No. It requires Apache + `AllowOverride` + a writable `.htaccess`. On Nginx, `.htaccess` is ignored; use WordPress-level blocking or configure your server rules manually. = Why don’t the stats include `.htaccess` blocks? = Because `.htaccess` blocks requests before WordPress runs. WordPress cannot count requests it never sees. Use your server logs for that. = Can I see traffic / blocks in a graph? = The plugin includes a simple “last 14 days” bar overview and a small table for WordPress-level blocks. For accurate totals (including `.htaccess` blocks), use server logs or analytics. == Changelog == = 1.0.0 = * Initial release. == Upgrade Notice == = 1.0.0 = Initial release. Block AI scrapers by User-Agent with optional .htaccess or WordPress-level blocking.