Probabilistic Verification for AI Agents: DRO-Based Framework Bounds Policy Violation Risk
New research introduces a sound and efficient framework using distributionally robust optimization to compute upper bounds on policy violation probability for AI agents, overcoming the limitations of deterministic-only runtime monitoring.
Existing runtime monitoring approaches for AI agents are restricted to deterministic policies, a critical limitation identified in recent research (arXiv 2606.20510). As AI agents increasingly operate in ambiguous environments, there is a pressing need to enforce security policies that involve probabilistic predicates or state transitions. For example, a declassifier or Personally Identifiable Information (PII) detector may have a non-zero failure probability on each invocation, requiring a framework that can reason about probabilistic outcomes.
To address this gap, the authors introduce a sound and efficient framework for probabilistic verification of AI agents based on distributionally robust optimization (DRO). The core innovation is the ability to compute sound upper bounds on the probability of policy violation regardless of possible correlations between predicates. This is a significant advancement because, in many practical applications, one cannot easily make the independence assumptions necessary to invoke prior work on probabilistic inference in Datalog. By avoiding these assumptions, the framework provides rigorous guarantees even when predicate correlations are unknown or complex.
The framework leverages Datalog as the formal language for expressing and enforcing security policies at runtime for AI agents. Datalog’s declarative nature allows for clear specification of policies, and the DRO-based verification extends its applicability to probabilistic settings. The approach is evaluated on standard benchmarks for terminal and tool-calling agents, demonstrating that it outperforms prior art and improves the security-utility trade-off while ensuring rigorous bounds on the probability of policy violation.
This work has direct implications for blockchain and crypto ecosystems, where autonomous agents must operate under strict safety invariants. For on-chain AI agents, such as those in DeFi, the framework can provide provable upper bounds on the probability of policy violations (e.g., unauthorized token transfers) without requiring knowledge of correlations between oracle calls. In TEE-based AI inference, it enables sound risk bounding even when the agent’s internal state is hidden. MEV bots using ML models can enforce probabilistic compliance policies with formal guarantees, and the approach pairs naturally with zero-knowledge proofs for verifiable policy adherence. By bridging the gap between deterministic runtime monitoring and real-world probabilistic agent behavior, this research opens new avenues for secure and trustworthy autonomous systems.
Evidence & Provenance
Every claim is hash-locked to its source span. Click any [N] marker above to verify.
Claim 33 Existing runtime monitoring approaches for AI agents are restricted to deterministic policies and cannot handle probabilistic predicates or state transitions.
existing approaches are restricted to deterministic policies. In many practical applications of AI agents, there is a need to enforce security policies in the face of ambiguity, leading to probabilistic predicates or state transitions
42573191e710389d3868326c2688b112806209673de879d0abcb05bac85c0b39 Claim 34 A motivating example involves a declassifier or PII detector that has some failure probability on each invocation, requiring probabilistic policy enforcement.
for example, a declassifier or Personally Identifiable Information (PII) detector that has some failure probability on each invocation
c70c4d37192a602fd1191dc2d0cc1678eb83b2490cc10a33ca528c1d7acd715c Claim 35 The paper introduces a sound and efficient framework for probabilistic verification of AI agents based on distributionally robust optimization.
We address this by introducing a sound and efficient framework for such verification based on distributionally robust optimization
4d6abbea78e91510fb0b437dd82e9aa9f001066c2f357c35b9fad8ce653aa06e Claim 36 The framework computes sound upper bounds on the probability of policy violation regardless of possible correlations between predicates, avoiding independence assumptions.
computing sound upper bounds on the probability of policy violation regardless of possible correlations between predicates
ef209fc9c185333dc3cf219df74a94d1f7b42763efa85721258b52046c8d2a99 Claim 37 The approach does not require the independence assumptions necessary for prior work on probabilistic inference in Datalog.
in many such applications, one cannot easily make the independence assumptions necessary to invoke prior work on probabilistic inference in Datalog
7757ba7fc11a01266e188cd0e2ae6077ef64a5b3d505f2c7caa9091f57d748f4 Claim 38 The framework is evaluated on standard benchmarks for terminal and tool calling agents, outperforming prior art and improving the security-utility trade-off while ensuring rigorous bounds on violation probability.
On standard benchmarks for terminal and tool calling agents, we demonstrate that our approach outperforms prior art and improves the security-utility trade-off while ensuring rigorous bounds on the probability of policy violation
0aaf587282cdb6acf862086a963a19b191fe0f6fbccbad314518cbc6194ea94b Claim 39 The framework uses Datalog as the formal language for expressing and enforcing security policies at runtime for AI agents.
runtime monitoring approaches that formulate and enforce policies expressed in a formal language like Datalog offer a promising solution
cfe4aec4e6216895e4e59caa6a9ffcd5a22db70c694667af78a9f4b0d0d627db