Skip to content

Architecture Overview

The 9 layers

Grampus is organized into 9 layers, each with a clear boundary and single responsibility:

graph TB
    subgraph "Layer 9 — CLI"
        CLI["grampus init · run · eval · dev · cost · memory"]
    end

    subgraph "Layer 8 — Evaluation"
        Eval["EvalSuite · Assertions · PromptVersions · QualityBaseline"]
    end

    subgraph "Layer 7 — Observability"
        Obs["GrampusTracer · GrampusMetrics · EventLog · BehaviorMonitor"]
    end

    subgraph "Layer 6 — Safety"
        Safety["SafetyPipeline · InjectionDetector · PIIDetector · ActionGuard"]
    end

    subgraph "Layer 5 — Orchestration"
        Orch["AgentRunner · Graph · Crew · Debate · ModelRouter · CostTracker"]
    end

    subgraph "Layer 4 — Tools"
        Tools["ToolRegistry · ToolExecutor · MCPClient · Sandbox · CodeExecutor"]
    end

    subgraph "Layer 3 — Memory"
        Mem["Working · Episodic · Semantic · Procedural · Manager · Security"]
    end

    subgraph "Layer 2 — Dapr Integration"
        Dapr["DaprClient · StateStore · PubSub · Workflow · Lock · Health"]
    end

    subgraph "Layer 1 — Core"
        Core["Config · Errors · Types · Logging · ModelClients"]
    end

    subgraph "Layer 0 — Dapr Runtime"
        Runtime["State · Pub/Sub · Workflows · Actors · mTLS · OTEL"]
    end

    CLI --> Eval
    CLI --> Orch
    Eval --> Orch
    Orch --> Safety
    Orch --> Mem
    Orch --> Tools
    Orch --> Obs
    Safety --> Core
    Mem --> Dapr
    Tools --> Dapr
    Obs --> Dapr
    Dapr --> Runtime
    Orch --> Core
    Mem --> Core
    Tools --> Core

Layer descriptions

Layer 0 — Dapr Runtime

Dapr provides the distributed infrastructure that Grampus runs on. It handles:

  • State management — PostgreSQL (primary) and Redis (cache) via the Dapr State API
  • Pub/Sub messaging — Redis Streams for event fanout and behavior monitoring
  • Durable workflows — checkpoint-based workflow execution for long-running agent tasks
  • Distributed actors — for stateful per-agent coordination
  • mTLS — mutual TLS between services, zero-configuration
  • OTEL — Dapr emits infrastructure-level traces automatically

Grampus never talks to PostgreSQL or Redis directly. All I/O goes through the Dapr sidecar at http://localhost:3500.

Layer 1 — Core

Foundation types that everything else builds on:

  • ConfigGrampusConfig via pydantic-settings; loads from env vars and YAML
  • ErrorsGrampusError hierarchy with machine-readable code strings
  • Types — Pydantic v2 models: AgentDefinition, AgentState, ExecutionResult, Message, ToolCall, ToolResult, TokenUsage
  • Logging — structlog with JSON (prod) or console (dev) output, correlation IDs via contextvars
  • Model clientsModelClient ABC with Anthropic and OpenAI implementations

Layer 2 — Dapr Integration

Typed Python wrappers around Dapr's APIs:

  • DaprStateStore — namespace-scoped keys, Pydantic serialization, ETag concurrency, bulk ops, transactions
  • DaprPubSub — typed publish/subscribe with handler decorators
  • DaprWorkflow — durable workflow base class with automatic checkpointing
  • Distributed lockasync with dapr_lock("resource"): ...
  • Health check — verifies sidecar availability before use

Layer 3 — Memory

Four memory types plus a security layer:

  • Working — in-session token window with auto-summarization (truncate / summarize / hybrid)
  • Episodic — cross-session events with embeddings, temporal decay, and importance scoring
  • Semantic — subject–predicate–object facts with confidence-weighted deduplication
  • Procedural — learned workflow templates with trigger conditions
  • Consolidation — async pipeline that extracts semantic facts from episodic records
  • Security — provenance tracking, trust scoring, injection detection, rate limiting, content hash auditing

All four types are accessed through MemoryManager — the orchestration layer never calls individual stores.

Layer 4 — Tools

Everything needed to safely invoke tools:

  • ToolRegistry — register tools via @registry.tool() decorator, list as JSON schema
  • ToolExecutor — validation, timeout, retry, idempotency, execution records
  • MCPClient — discover and invoke tools from MCP-compatible servers
  • SandboxManager — Docker container isolation for LLM-generated code
  • CodeExecutor — execute Python in sandbox with registered tools injected as callables
  • ActionGuard — per-agent allowlist/denylist, rate limiting, cost guard

Layer 5 — Orchestration

The agent loop and multi-agent coordination:

  • AgentRunner — ReAct loop (Observe→Think→Act). Integrates memory recall, tool execution, safety checks, cost tracking, state persistence
  • Graph — DAG-based workflow engine with conditional edges, parallel branches, Dapr checkpoints
  • Pre-built nodesLLMNode, ToolNode, ConditionalNode, HumanNode, SubgraphNode, debate_node
  • ModelRouter — route steps to cheapest capable model by tier (fast/balanced/powerful)
  • CostTracker — per-model/agent/session/step cost tracking with budget enforcement
  • Crew — sequential, parallel, and hierarchical multi-agent patterns
  • DebateOrchestrator — multi-agent debate over a single question: concurrent per-round asyncio.gather, Jaccard-similarity convergence detection, adaptive routing (skip when confident), sycophancy-resistant round-2+ prompts, three aggregation strategies (majority vote, weighted vote, judge model), and escalate_to_human for low-convergence answers

Layer 6 — Safety

Runtime safety middleware wrapping every agent action:

  • SafetyPipeline — composes all checks; configurable per check category
  • PromptInjectionDetector — regex + heuristic + semantic patterns at three strictness levels
  • PIIDetector — detect and redact email, phone, SSN, credit card, address
  • ActionGuard — per-agent tool boundaries, consecutive call limits, cost caps
  • Policy loader — YAML-based configuration, no code changes required

Layer 7 — Observability

Three-layer observability:

  • GrampusTracer — OTEL custom spans for every agent action (agent.run, agent.llm_call, agent.tool_call, etc.)
  • GrampusMetrics — Prometheus endpoint with counters, gauges, and histograms
  • EventLog — append-only, replayable audit log stored in Dapr state
  • BehaviorMonitor — rolling-window anomaly detection for cost, tool usage, latency

Layer 8 — Evaluation

Built-in testing for agent behavior:

  • EvalSuite — collection of EvalCase objects run against an AgentRunner
  • Assertions — 16 types: content, tools, performance, budget, safety, LLM-as-judge
  • PromptVersionManager — track, diff, A/B test, and rollback system prompts
  • QualityBaseline — pin pass rates and detect regressions
  • Reporters — text, JSON, JUnit XML for CI integration

Layer 9 — CLI

Developer experience:

  • grampus init — scaffold with templates (simple, crew, rag)
  • grampus run — REPL or single-shot with Dapr sidecar management
  • grampus eval — run eval suites with pass-rate gating for CI
  • grampus memory — inspect, clear, and show stats for agent memory
  • grampus cost — show cost summary across sessions
  • grampus dev — watch mode with auto-reload and live cost output

Data flow

A single user turn flows through the system as follows:

sequenceDiagram
    participant U as User
    participant SP as SafetyPipeline
    participant AR as AgentRunner
    participant MM as MemoryManager
    participant TE as ToolExecutor
    participant LLM as ModelClient
    participant EL as EventLog
    participant Dapr

    U->>SP: user input
    SP->>SP: injection check + PII scan
    SP->>AR: clean input
    AR->>EL: event: agent.started
    AR->>MM: recall(query=input)
    MM->>Dapr: state GET (episodic + semantic)
    Dapr-->>MM: recalled memories
    MM-->>AR: MemoryRecallResult
    AR->>LLM: complete(messages + memories + tools)
    LLM-->>AR: response with tool_calls

    loop ReAct iterations
        AR->>SP: check_tool_call(tool_call)
        SP-->>AR: approved
        AR->>TE: execute(tool_call)
        TE-->>AR: ToolResult
        AR->>SP: check_tool_result(result)
        SP-->>AR: clean result
        AR->>EL: event: tool.completed
        AR->>LLM: complete(updated messages)
        LLM-->>AR: final answer (no tool calls)
    end

    AR->>MM: remember(output)
    MM->>Dapr: state PUT (episodic record)
    AR->>SP: check_llm_output(answer)
    SP-->>AR: clean answer
    AR->>EL: event: agent.completed
    AR-->>U: ExecutionResult

Design principles

Async-first: all I/O uses async/await. CPU-bound work runs in thread pool executors. This enables high concurrency for I/O-bound agent workflows.

Dependency injection everywhere: no global mutable state. Every component receives its dependencies through the constructor. This makes testing trivial — swap real components for fakes without patching.

Dapr abstracts infrastructure: agent code never imports psycopg2, redis, or aiokafka. Components are swapped by editing Dapr YAML files, not Python code.

Safety by default: the SafetyPipeline is always active. To disable a check (e.g., for testing), you must explicitly configure SafetyPipelineConfig(check_user_input=False). There is no silent bypass.

Provenance on every write: the memory system rejects writes without provenance metadata at the DaprStateStore wrapper level — below application code. This makes provenance non-bypassable without changing framework code.


Next steps