Skip to content

Safety API Reference

SafetyPipeline

The main safety middleware. Compose injection detection, PII detection, and action guard into a unified check pipeline.

grampus.safety.pipeline.SafetyPipeline

Middleware that wraps agent actions with safety checks.

All checks are applied in order. A BLOCK-level result raises SafetyError. Non-blocking detections are logged and returned as SafetyViolation records.

Parameters:

Name Type Description Default
injection_detector object | None

Optional injection detector. Skipped if None.

None
pii_detector object | None

Optional PII detector. Skipped if None.

None
action_guard object | None

Optional action guard. Skipped if None.

None
config SafetyPipelineConfig | None

Which check categories to enable.

None

check_input(text) async

Check user input for injection then PII.

Parameters:

Name Type Description Default
text str

Raw user input.

required

Returns:

Type Description
tuple[str, list[SafetyViolation]]

Tuple of (possibly-redacted text, new violations from this call).

Raises:

Type Description
SafetyError

code="INPUT_BLOCKED" if injection is detected at threshold.

check_tool_result(result) async

Check tool result output for injection and PII.

Parameters:

Name Type Description Default
result ToolResult

The ToolResult from tool execution.

required

Returns:

Type Description
tuple[ToolResult, list[SafetyViolation]]

Tuple of (possibly-redacted result, new violations).

Raises:

Type Description
SafetyError

code="TOOL_RESULT_BLOCKED" if injection is detected.

check_llm_output(text) async

Check LLM response for PII only (injection not blocked to avoid loops).

Parameters:

Name Type Description Default
text str

LLM-generated text to check.

required

Returns:

Type Description
tuple[str, list[SafetyViolation]]

Tuple of (possibly-redacted text, new violations).

check_tool_call(tool_call, *, calls_this_turn=0, consecutive_calls=0) async

Check a tool call against action guard policy.

Parameters:

Name Type Description Default
tool_call ToolCall

The tool call to evaluate.

required
calls_this_turn int

Calls already made this turn.

0
consecutive_calls int

Consecutive calls without a non-tool step.

0

Returns:

Type Description
tuple[ToolCall, list[SafetyViolation]]

Tuple of (tool_call unchanged, new violations).

Raises:

Type Description
SafetyError

code="ACTION_BLOCKED" if the action guard denies the call.

get_violations()

Return all violations recorded in this pipeline instance's lifetime.


SafetyPipelineConfig

from grampus.safety.pipeline import SafetyPipelineConfig

config = SafetyPipelineConfig(
    check_user_input=True,       # run injection + PII on user input
    check_tool_results=True,     # run injection + PII on tool output
    check_llm_output=True,       # run PII on LLM response (injection not blocked)
    check_memory_writes=True,    # run injection on memory write content
    log_violations=True,         # emit structlog events for violations
)

Injection detector

grampus.safety.injection.PromptInjectionDetector

Multi-layer prompt injection detector.

Three detection layers applied in order: 1. Regex — known attack signatures (fast, zero false-negatives on known patterns) 2. Heuristic — structural signals (role override attempts, instruction boundaries) 3. Keyword — semantic markers without full NLP

Parameters:

Name Type Description Default
level DetectionLevel

Detection strictness. Controls the confidence threshold above which detected=True is returned. STRICT >= 0.3, BALANCED >= 0.5, PERMISSIVE >= 0.8

BALANCED

check(text)

Synchronous check — no I/O. Returns InjectionResult.

Parameters:

Name Type Description Default
text str

The text to inspect for injection attempts.

required

Returns:

Type Description
InjectionResult

InjectionResult with confidence score and matched patterns.

InjectionCheckResult

@dataclass
class InjectionCheckResult:
    detected: bool
    pattern: str | None        # matched pattern name
    confidence: float          # 0.0–1.0
    blocked: bool              # True if level blocks this confidence

PII detector

grampus.safety.pii.PIIDetector

Regex-based PII detector with configurable action per PII type.

Parameters:

Name Type Description Default
actions dict[PIIType, PIIAction] | None

Map of PIIType -> PIIAction. Defaults to REDACT for all types. If a type is not in the map, defaults to LOG.

None

PIICheckResult

@dataclass
class PIICheckResult:
    detected: bool
    types_found: list[str]        # ["email", "phone", ...]
    redacted_text: str            # original if action="log", else redacted
    blocked: bool                 # True if action="block" and PII found

Action guard

grampus.safety.action_guard.SafetyActionGuard

Orchestration-level action guard enforcing policy rules.

Can be constructed either with an ActionPolicy object or with inline keyword arguments (allowed_tools, denied_tools, max_tool_calls_per_turn) for quick one-off usage.

Parameters:

Name Type Description Default
policy ActionPolicy | None

Full ActionPolicy for this agent. When provided, all keyword arguments are ignored.

None
allowed_tools list[str] | None

Optional allowlist — when set only listed tools are permitted.

None
denied_tools list[str] | None

Explicit denylist of tool names.

None
max_tool_calls_per_turn int

Hard cap on tool calls per agent turn.

20

ActionPolicy

from grampus.safety.action_guard import ActionPolicy

policy = ActionPolicy(
    allowed_tools=["web_search", "calculate"],  # explicit allowlist (None = allow all)
    denied_tools=[],                             # explicit denylist
    max_tool_calls_per_turn=20,                  # across all tools per turn
    max_consecutive_tool_calls=8,                # before requiring LLM step
    max_cost_per_action_usd=0.05,               # per-tool-call cost cap
)

SafetyViolation

Structured record emitted for every detected issue:

@dataclass
class SafetyViolation:
    violation_type: str    # "injection" | "pii" | "action_blocked"
    severity: str          # "critical" | "high" | "medium" | "low"
    detail: str            # human-readable description
    blocked: bool          # True = request was blocked, False = logged only
    timestamp: datetime

Policy loader

grampus.safety.policies.PolicyLoader

Loads GrampusSafetyPolicy from a YAML file or dict.

load(path=None) staticmethod

Load and validate policy. Returns default policy if path is None.

Parameters:

Name Type Description Default
path str | None

Filesystem path to a YAML policy file.

None

Returns:

Type Description
GrampusSafetyPolicy

A validated GrampusSafetyPolicy instance.

Raises:

Type Description
ConfigError

If the file is missing or contains invalid YAML/schema.

Example policy YAML

# safety_policy.yaml
injection:
  level: balanced

pii:
  action: redact
  types:
    - email
    - phone
    - ssn
    - credit_card

action_guard:
  allowed_tools:
    - web_search
    - calculate
  max_tool_calls_per_turn: 20
  max_consecutive_tool_calls: 8
  max_cost_per_action_usd: 0.05

pipeline:
  check_user_input: true
  check_tool_results: true
  check_llm_output: true
  check_memory_writes: true
  log_violations: true

Loading:

from grampus.safety.policies import load_safety_policy
from grampus.safety.pipeline import SafetyPipeline

safety_config = load_safety_policy("safety_policy.yaml")
pipeline = SafetyPipeline.from_config(safety_config)