Skip to main content

Parse Agents Risk Categories

Parse Agents reports a 0-10 risk_score, a verdict, typed flags, and categories. The canonical hosted detector currently uses nine public categories.

CategoryMeaning
prompt_injectionAttempts to override, replace, or smuggle instructions into an agent context
jailbreakAttempts to bypass safety, policy, or role boundaries
data_exfiltrationAttempts to extract secrets, credentials, system prompts, private data, or protected context
harmful_contentRequests or generated content that may cause direct safety or abuse risk
system_prompt_leakAttempts to reveal hidden system, developer, or policy instructions
privilege_escalationAttempts to grant unauthorized authority, permissions, or execution scope
social_engineeringManipulation through urgency, impersonation, false authority, or trust claims
code_executionAttempts to run shell, code, package installation, file access, or network execution
indirect_injectionHidden or remote instructions in documents, websites, tool output, metadata, or structured fields
ScoreTypical action
0-2Allow
3-6Caution, isolate, or sandbox depending on the boundary
7-10Block by default

Use the returned policy fields and suggested_action when present. Do not hard-code a security decision if the tenant has configured a stricter policy.

Limitations

These categories are detection signals, not guarantees. Pair Parse Agents with least-privilege tools, scoped credentials, output validation, logging, and operator review for high-impact actions.