Concepts¶
This page explains the mental models behind Grampus. Understanding these concepts will help you design agents that are reliable, safe, and observable.
The 4 memory types¶
Agents need to remember things at different timescales. Grampus provides four purpose-built memory stores:
graph LR
WM["Working Memory\n(current conversation\ntokens, auto-summarized)"]
EM["Episodic Memory\n(cross-session events\nwhat happened, when)"]
SM["Semantic Memory\n(SPO facts\nwhat is true)"]
PM["Procedural Memory\n(learned workflows\nhow to do things)"]
WM -->|"consolidation pipeline"| EM
EM -->|"fact extraction"| SM
SM -->|"pattern recognition"| PM
| Memory type | Timescale | Stores | Example |
|---|---|---|---|
| Working | Current session | Recent messages, token-limited | Last 20 turns of conversation |
| Episodic | Cross-session | Events with timestamps, embeddings | "On 2025-01-15, user asked about pricing" |
| Semantic | Persistent | Subject-Predicate-Object facts | user → prefers → dark mode |
| Procedural | Persistent | Learned workflows with trigger conditions | Steps to file a support ticket |
The MemoryManager provides a unified interface to all four types. You rarely interact with individual stores directly.
The ReAct loop¶
Every agent run is a loop of Observe → Think → Act until the agent produces a final answer or reaches the iteration limit:
flowchart TD
Input["User input"] --> Recall["Recall relevant memories"]
Recall --> LLM["LLM: think + decide"]
LLM -->|"tool call"| Tool["Execute tool"]
Tool --> Check["Safety check result"]
Check --> Store["Store to memory"]
Store --> LLM
LLM -->|"final answer"| Output["Return ExecutionResult"]
LLM -->|"max iterations"| Err["OrchestrationError"]
Each iteration:
- Build context from working memory and recalled episodic/semantic memories
- Call the LLM with the full message history
- If the LLM requests a tool call: validate, safety-check, execute, safety-check result
- Append tool result to message history and loop
- If the LLM returns a final text response: store to memory, return
ExecutionResult
AgentRunner implements this loop. The max_iterations guard in RunnerConfig prevents infinite loops.
The graph engine¶
For complex workflows, use the Graph engine instead of (or alongside) AgentRunner. Nodes are async callables; edges define transitions:
graph LR
A["LLMNode\n(classify intent)"] -->|"search"| B["ToolNode\n(web_search)"]
A -->|"answer"| C["LLMNode\n(draft reply)"]
B --> C
C --> D["HumanNode\n(review gate)"]
D -->|"approved"| E["End"]
D -->|"revise"| C
Key properties:
- Checkpointing: state is saved to Dapr after each node, so a crashed agent can resume
- Parallel branches: independent branches run concurrently
- Conditional edges: functions that inspect state determine the next node
Multi-agent debate¶
For high-stakes questions — legal analysis, medical triage, financial decisions — you can run the same question past a panel of LLMs and let them argue. DebateOrchestrator manages the rounds, detects convergence, and aggregates the result.
sequenceDiagram
participant Q as Question
participant D0 as Debater 0
participant D1 as Debater 1
participant D2 as Devil's advocate
participant Agg as Aggregator
Note over D0,D2: Round 1 — independent answers (concurrent)
Q->>D0: Think independently
Q->>D1: Think independently
Q->>D2: Think independently (skeptical)
D0-->>Q: Answer A, confidence 0.80
D1-->>Q: Answer A, confidence 0.72
D2-->>Q: Answer B, confidence 0.65
Note over D0,D2: Round 2 — critique peers (concurrent)
Q->>D0: Restate yours. Evaluate peers. Change only if logically compelled.
Q->>D1: same
Q->>D2: same
D0-->>Q: Still A, confidence 0.85
D1-->>Q: Switched to A, confidence 0.80
D2-->>Q: Still B, confidence 0.60
Agg-->>Q: Final: A (2/3 ≈ 0.67 ≥ threshold 0.5)
Key design decisions:
- Heterogeneous models beat same-model temperature diversity (M3MAD-Bench, ICLR 2025). Run
haiku + sonnet + sonnet-with-devil's-advocaterather than three sonnets. - Sycophancy resistance — round 2+ prompts require debaters to restate their prior answer before critiquing peers, and to justify any change with specific evidence (ACL 2025).
- Adaptive routing — when a single fast model reports high confidence (≥ 0.85 by default), the full debate is skipped. This eliminates ~40% of unnecessary calls with no quality loss.
- Escalation — when the panel still disagrees after all rounds,
escalate_to_human=Trueis set on the result. Your graph can route this to ahuman_nodefor review.
The debate_node() factory wires a DebateOrchestrator into the graph engine, so escalation routing looks the same as any other conditional edge.
The safety layer¶
Every piece of text that flows through an agent — user input, tool results, LLM outputs, memory writes — passes through the SafetyPipeline:
flowchart LR
Input --> InjectionCheck["Injection\ndetection"]
InjectionCheck -->|"clean"| PIICheck["PII\ndetection"]
PIICheck -->|"redacted"| ActionGuard["Action\nguard"]
ActionGuard -->|"allowed"| Agent["Agent"]
InjectionCheck -->|"injection!"| Block1["SafetyError\n(blocked)"]
ActionGuard -->|"denied"| Block2["SafetyError\n(blocked)"]
The pipeline is configured via YAML policies — no code changes needed to tighten or relax safety rules.
Safety is not optional
The injection detector runs on tool results specifically, because tool output is the most common vector for prompt injection attacks. See Security Model for the threat model.
Agent handoffs¶
Sometimes an agent discovers mid-run that a question is outside its expertise and needs to delegate to a specialist. That is an agent handoff — a runtime transfer of control from one agent to another, carrying the accumulated conversation context.
Handoffs differ from multi-agent crews in one key way: a crew's composition is decided upfront, before execution starts. A handoff happens dynamically, triggered by the running agent's own judgment. Use handoffs when the routing decision depends on what the user actually says, not what you predict they might say.
Security is built into the handoff layer: context passed to the target agent is tagged as LLM_GENERATED (trust 0.7, lower than direct USER_INPUT at 0.9), injection patterns are scanned before context is handed over, and HandoffPolicy.max_depth prevents infinite agent loops. See the Agent Handoffs guide →
Dapr as infrastructure¶
Grampus never writes to databases or message brokers directly. All persistence and messaging goes through the Dapr sidecar:
graph LR
Agent["Grampus Agent"] -->|"HTTP :3500"| Dapr["Dapr Sidecar"]
Dapr -->|"statestore-postgres"| PG["PostgreSQL + pgvector"]
Dapr -->|"cache"| Redis["Redis"]
Dapr -->|"pubsub-redis"| Events["Event bus"]
Dapr -->|"OTEL"| Jaeger["Jaeger / Collector"]
This means:
- Swap any component without changing agent code — switch Redis to Kafka for pub/sub by editing a YAML file
- mTLS between services is handled by Dapr automatically
- Distributed workflows with checkpointing work out of the box
Provenance¶
Every memory write carries a Provenance record:
Provenance(
source_type=SourceType.TOOL_RESULT, # where did this come from?
source_id="web_search:call_abc123", # which specific invocation?
trust_level=0.6, # how trusted is this source?
timestamp=datetime.now(UTC),
content_hash_sha256="sha256:...", # tamper detection
)
Trust levels by source:
| Source type | Default trust |
|---|---|
SYSTEM |
1.0 |
USER_INPUT |
0.9 |
LLM_GENERATED |
0.7 |
TOOL_RESULT |
0.6 |
EXTERNAL_DATA |
0.3 |
The memory auditor periodically verifies content hashes. Tampered entries are flagged. This is the primary defense against memory poisoning attacks (MINJA, MemoryGraft).
Next steps¶
- Single-agent guide → — Put all these concepts together
- Memory guide → — Deep dive into all four memory types
- Architecture overview → — Full 9-layer diagram and narrative