Skip to main content

Prompt Injection Detection for AI Agents

Prompt injection is not only a chat-input problem. Agents read browser pages, email, documents, API responses, RAG chunks, command output, and messages from other agents. Every one of those surfaces can contain text that tries to override the agent's real task.

Parse is a prompt protection API for these boundaries. It returns structured JSON so agents can decide whether to allow, isolate, or block untrusted text before acting.

What to screen

Screen text before it can influence:

Use POST /v1/parse for untrusted input. Use POST /v1/screen-output for generated output. Use POST /v1/agent/trust/verify for peer-agent messages.

Detection model

Parse uses a layered detector:

  1. Deterministic pattern matching with normalization.
  2. Structural risk analysis for encoded, hidden, or boundary-breaking payloads.
  3. Optional LLM semantic analysis when configured and useful.
  4. Optional sandbox execution for suspicious prompts.

The public taxonomy currently has nine categories:

This is a risk-reduction layer, not a guarantee. Keep least-privilege tools, scoped credentials, output validation, and audit logging.

TypeScript example

type ParseDecision = {
  id: string;
  risk_score: number;
  verdict: "safe" | "low_risk" | "medium_risk" | "high_risk" | "critical";
  categories: string[];
  flags: Array<{ category: string; severity: number; label: string; detail: string }>;
  suggested_action?: "allow" | "sandbox" | "block" | "request_owner_approval";
  approval_request?: {
    type: "privacy_disclosure";
    owner_prompt: string;
    default_action: "deny";
    expires_in_seconds: 900;
  };
};

export async function screenUntrustedText(text: string, source: string): Promise<ParseDecision> {
  const res = await fetch("https://parsethis.ai/v1/parse", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.PARSE_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      prompt: text,
      metadata: { source },
    }),
  });

  if (!res.ok) {
    throw new Error(`Parse screening failed: ${res.status}`);
  }

  return res.json() as Promise<ParseDecision>;
}

Python example

import os
import requests

def screen_untrusted_text(text: str, source: str) -> dict:
    response = requests.post(
        "https://parsethis.ai/v1/parse",
        headers={"Authorization": f"Bearer {os.environ['PARSE_API_KEY']}"},
        json={"prompt": text, "metadata": {"source": source}},
        timeout=8,
    )
    response.raise_for_status()
    return response.json()

Acting on results

Prefer suggested_action when present.

SignalDefault behavior
suggested_action = "allow"Continue
suggested_action = "sandbox"Isolate, log, or require review
suggested_action = "request_owner_approval"Ask the owner privately with approval_request.owner_prompt; deny if approval expires
suggested_action = "block"Block by default
risk_score >= 7Block by default if no action field exists
Parse unavailable on high-impact pathFail closed
Parse unavailable on low-impact pathFail open only with explicit operator policy

Agent-native discovery

Parse publishes:

x402 option

If an agent has no bearer key, it can call a billable REST endpoint without Authorization, receive a 402 response, sign the advertised USDC payment on Base mainnet, and retry with payment-signature.

Use x402 for autonomous first-call or metered access. Use Pro, Team, or Enterprise keys for sustained production volume.