Skip to main content

Prompt Guard — Safety screening for AI agents

Detect prompt injection, role hijacking, and data exfiltration risks in real-time. Prompt Guard integrates directly into your agent pipeline via MCP, Node.js, or Python — and blocks threats before your model ever sees them.

Try the Playground Quick Start Guide

Installation

Add to your Claude Desktop or Cursor claude_desktop_config.json:

{
  "mcpServers": {
    "prompt-guard": {
      "command": "npx",
      "args": [
        "-y",
        "@parsethis/mcp-prompt-guard"
      ],
      "env": {
        "PARSETHIS_API_KEY": "your-key-here"
      }
    }
  }
}

Replace your-key-here with your API key from POST /v1/keys/generate.

How it works

1
Install

Add Prompt Guard via MCP, npm, or pip. Configure your API key once. No per-request setup.

2
Screen

Every incoming prompt is scored 0–10 across 8 threat categories in under 200ms before your agent executes it.

3
Act

Block threats automatically (score ≥ 7), flag caution cases (4–6), or allow safe prompts through (≤ 3).

Privacy & data handling

What happens to your prompt content depends on which execution mode you use. These disclosures are exact — not approximate.

Standard screening

Prompt content is NOT stored. It is processed in memory and discarded after analysis. Only the risk score, verdict, prompt length, and flag categories are written to the audit log.

Async execution mode

When using async screening (async: true), the prompt is stored in Redis for up to 10 minutes while analysis completes in the background. It is deleted automatically after the result is retrieved or the TTL expires.

Evaluation mode

When using POST /v1/evaluate, the prompt is stored in Postgres for the duration of the evaluation job. Results are retained for 30 days by default and may include prompt content for audit purposes.

Local mode

When running locally via the self-hosted Docker image, prompts never leave your machine. All screening runs in-process against local pattern databases.

Audit log (all modes)

Every request writes a structured audit record containing: risk score, verdict, prompt length, detected flags, timestamp, and API key ID. Prompt content is NOT stored in the audit log.

See it in action

Paste any prompt and get a real risk score in under 200ms. No sign-up required.

Open Playground →