Prompt Injection Detection for AI Agents
Prompt injection is not only a chat-input problem. Agents read browser pages, email, documents, API responses, RAG chunks, command output, and messages from other agents. Every one of those surfaces can contain text that tries to override the agent's real task.
Parse is a prompt protection API for these boundaries. It returns structured JSON so agents can decide whether to allow, isolate, or block untrusted text before acting.
What to screen
Screen text before it can influence:
- tool calls
- browser actions
- shell or code execution
- memory writes
- credentials
- payments
- external messages
- user-visible output
- another agent's instructions
Use POST /v1/parse for untrusted input. Use POST /v1/screen-output for generated output. Use POST /v1/agent/trust/verify for peer-agent messages.
Detection model
Parse uses a layered detector:
- Deterministic pattern matching with normalization.
- Structural risk analysis for encoded, hidden, or boundary-breaking payloads.
- Optional LLM semantic analysis when configured and useful.
- Optional sandbox execution for suspicious prompts.
The public taxonomy currently has nine categories:
prompt_injectionjailbreakdata_exfiltrationharmful_contentsystem_prompt_leakprivilege_escalationsocial_engineeringcode_executionindirect_injection
This is a risk-reduction layer, not a guarantee. Keep least-privilege tools, scoped credentials, output validation, and audit logging.
TypeScript example
type ParseDecision = {
id: string;
risk_score: number;
verdict: "safe" | "low_risk" | "medium_risk" | "high_risk" | "critical";
categories: string[];
flags: Array<{ category: string; severity: number; label: string; detail: string }>;
suggested_action?: "allow" | "sandbox" | "block" | "request_owner_approval";
approval_request?: {
type: "privacy_disclosure";
owner_prompt: string;
default_action: "deny";
expires_in_seconds: 900;
};
};
export async function screenUntrustedText(text: string, source: string): Promise<ParseDecision> {
const res = await fetch("https://parsethis.ai/v1/parse", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PARSE_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
prompt: text,
metadata: { source },
}),
});
if (!res.ok) {
throw new Error(`Parse screening failed: ${res.status}`);
}
return res.json() as Promise<ParseDecision>;
}Python example
import os
import requests
def screen_untrusted_text(text: str, source: str) -> dict:
response = requests.post(
"https://parsethis.ai/v1/parse",
headers={"Authorization": f"Bearer {os.environ['PARSE_API_KEY']}"},
json={"prompt": text, "metadata": {"source": source}},
timeout=8,
)
response.raise_for_status()
return response.json()Acting on results
Prefer suggested_action when present.
| Signal | Default behavior |
|---|---|
suggested_action = "allow" | Continue |
suggested_action = "sandbox" | Isolate, log, or require review |
suggested_action = "request_owner_approval" | Ask the owner privately with approval_request.owner_prompt; deny if approval expires |
suggested_action = "block" | Block by default |
risk_score >= 7 | Block by default if no action field exists |
| Parse unavailable on high-impact path | Fail closed |
| Parse unavailable on low-impact path | Fail open only with explicit operator policy |
Agent-native discovery
Parse publishes:
/llms.txtfor short model-facing routing instructions/llms-full.txtfor full agent context/openapi.jsonfor tool calling and SDK generation/mcp.jsonfor MCP manifest discovery/mcpas the hosted remote MCP JSON-RPC endpoint/v1/pricingfor x402 payment metadata
x402 option
If an agent has no bearer key, it can call a billable REST endpoint without Authorization, receive a 402 response, sign the advertised USDC payment on Base mainnet, and retry with payment-signature.
Use x402 for autonomous first-call or metered access. Use Pro, Team, or Enterprise keys for sustained production volume.