Technology
Parse screens untrusted text at agent trust boundaries before it can reach tools, memory, credentials, payments, code execution, or users.
Tool result
External text asks the agent to treat data as a higher-priority instruction.
source: browser
normalize, detect, score
risk_score 8.4, verdict high_risk, suggested_action block
POST /v1/parse
Act on policy
The host keeps retrieved content as data and prevents it from steering tools.
trace_id: prs_7fd2
How the detector works.
Parse uses layered screening so hosts can make a practical allow, sandbox, approval, or block decision without exposing the internal detector recipe.
Fast public-category checks catch known instruction override, extraction, and boundary manipulation patterns.
Normalization and content-shape checks help spot encoded, hidden, or tool-shaped text trying to gain authority.
When configured and useful, semantic analysis evaluates ambiguous text in context before a host acts.
Suspicious workflows can be isolated so the host observes behavior without giving the text real authority.
Current evidence state.
The latest internal candidate freeze is now large enough for serious regression work, but it is intentionally not presented as public claimable evidence.
Schema-valid JSONL rows frozen for internal Hermes/runtime evidence hygiene.
Recursive intake audit passed with zero validation errors and zero duplicate prompts or IDs.
Benign agent workflow rows now clear the 5,000-row internal SOTA-style scale gate.
No public claimable row is counted until independent provenance, human review, detector lock, and CI-backed evaluation pass.
Stable row hash cda6d75e25f729ff3273f8041b9ef4a121c493ab1d5e127db3ca5e028e4dfcb0. Remaining blockers: blind human/adversarial review, independent provenance, detector/config lock, confidence intervals, and no-post-freeze tuning proof.
What Parse looks for.
The public taxonomy is intentionally stable and implementation-neutral. It tells developers what kind of risk was found without publishing bypass recipes.
Instruction override
Text that tries to replace the agent's real task, hierarchy, or operating constraints.
prompt_injectionSystem prompt extraction
Requests to reveal hidden instructions, protected prompts, private policy, or developer messages.
system_prompt_leakTool misuse
Attempts to steer browser, HTTP, filesystem, payment, or execution tools outside the intended workflow.
privilege_escalationData exfiltration
Attempts to send secrets, private context, customer data, or owner details to an untrusted party.
data_exfiltrationHidden content
Instructions hidden inside documents, pages, markup, structured fields, or tool output metadata.
indirect_injectionAgent spoofing
Messages that claim false identity, authority, urgency, or delegation rights from another agent.
social_engineeringDesigned for developer review.
Parse returns machine-readable fields that are easy to log, test, route, and explain during integration review.
application/json
{
"risk_score": 8.4,
"verdict": "high_risk",
"attack_detected": true,
"policy_violation": true,
"owner_approval_required": false,
"categories": ["prompt_injection", "indirect_injection"],
"flags": [
{
"type": "prompt_injection",
"severity": "high",
"description": "Untrusted text attempted to override host instructions."
}
],
"recommended_action": "block",
"suggested_action": "block",
"decision": {
"action": "block",
"basis": "attack_detected",
"confidence": "high"
},
"trace_id": "prs_7fd2",
"latency_ms": 31
}
Deployment expectations
- Call before untrusted text gains authority.
- Fail closed on high-impact paths.
- Keep tools and credentials least-privilege.
- Log
trace_idfor incident review. - Test the same boundary in the playground.
How to integrate it.
Use the surface that matches your agent runtime. The security decision stays the same: screen before authority.
Prompt screening
Screen untrusted text before tools, memory, credentials, payments, code, or users.
POST /v1/parseOutput screening
Screen generated or tool-derived output before forwarding it to another boundary.
POST /v1/screen-outputHosted MCP
Expose prompt screening, output screening, trust verification, and pricing discovery.
POST /mcpx402 pay-per-call
Let autonomous agents pay with USDC on Base mainnet when no account exists.
GET /v1/pricingWhat we disclose publicly.
Good security products should be understandable without publishing the recipe attackers or competitors need.
| Public | Kept internal | Why |
|---|---|---|
| Risk categories | Detector internals | Developers need stable labels, not a map for bypassing detection. |
| Response fields | Scoring weights | Hosts need clear actions while scoring remains adaptable. |
| Playground fixtures | Synthetic candidate corpus before review | Internal rows improve regression coverage, but they are not public proof until claimability gates pass. |
| Limitations | Bypass recipes | Honest boundaries improve integrations; exploit instructions do not. |
Build with a clear trust boundary.
Use Parse where untrusted text is about to gain authority.