Skip to content

CLI Reference

The grampus CLI is the primary interface for initializing, running, evaluating, and monitoring Grampus agents.

grampus --version    # grampus 0.1.0
grampus --help       # show all commands

grampus init

Scaffold a new Grampus project.

grampus init [OPTIONS] [NAME]
Option Default Description
NAME (prompted) Project directory name
--name TEXT "grampus-agent" Agent name used in config
--template TEXT "simple" Project template: simple, crew, rag
--output-dir TEXT "." Parent directory for the new project

Templates

Template Creates Best for
simple Single agent with one tool Getting started, learning Grampus
crew Three-agent crew (researcher, critic, writer) Multi-agent workflows
rag RAG agent with document retrieval tool Question answering over documents

Examples

# Create a simple agent in the current directory
grampus init my-agent

# Create a crew agent in a specific directory
grampus init --template crew --output-dir ~/projects research-crew

# Non-interactive (all defaults)
grampus init --name my-agent --template simple --output-dir .

Generated files

my-agent/
├── agent.py              # Agent code with create_runner() and create_agent_def()
├── grampus.yaml            # Configuration
├── docker-compose.yml    # Local infrastructure
├── dapr/
│   ├── config.yaml       # Dapr tracing config
│   └── components/       # State store, pub/sub, cache components
└── .gitignore

Exit codes

Code Meaning
0 Success
1 Directory already exists
2 Invalid template name

grampus run

Start an agent. Without --input, starts an interactive REPL. With --input, runs once and exits.

grampus run [OPTIONS] AGENT_FILE
Argument/Option Default Description
AGENT_FILE (required) Path to agent Python file
--config TEXT "grampus.yaml" Path to grampus.yaml configuration file
--session-id TEXT (auto-generated UUID) Session identifier for memory persistence
--input TEXT None Single-shot input; omit for interactive REPL

The agent file must export two functions:

  • create_runner() -> AgentRunner — constructs the runner with all dependencies
  • create_agent_def() -> AgentDefinition — returns the agent blueprint

Examples

# Interactive REPL
grampus run agent.py

# Single-shot (useful in scripts and CI)
grampus run agent.py --input "What is the capital of France?"

# Use a specific config file
grampus run agent.py --config config/production.yaml --input "Hello"

# Persist memory across runs using a fixed session ID
grampus run agent.py --session-id user-123 --input "What did we discuss last time?"

REPL commands

When running interactively:

Command Description
exit or quit End the session
/cost Show cost summary for this session
/memory Show current working memory window
/clear Clear working memory (start fresh)

Output format

[grampus] Session: abc12345
[grampus] Agent: research-agent | Model: claude-sonnet-4-6
> What is the capital of Brazil?
Brasília is the capital of Brazil, established in 1960.

[cost] Input: 42 tokens | Output: 18 tokens | Total: $0.000018

Exit codes

Code Meaning
0 Success
1 Agent file not found or invalid
2 Agent raised an unhandled error
3 Budget exceeded

grampus eval

Run an evaluation suite and report results.

grampus eval [OPTIONS] SUITE_FILE
Argument/Option Default Description
SUITE_FILE (required) Path to Python file defining EvalSuite
--format TEXT "text" Output format: text, json, junit
--output TEXT None Write report to file (stdout if omitted)
--fail-under FLOAT None Exit code 1 if pass rate < threshold (0.0–1.0)

The suite file must export a function create_suite() -> EvalSuite.

Examples

# Run suite with text output
grampus eval tests/eval_suite.py

# JSON output to file
grampus eval tests/eval_suite.py --format json --output results.json

# JUnit XML for CI
grampus eval tests/eval_suite.py --format junit --output results.xml

# Fail if pass rate below 90% (CI gate)
grampus eval tests/eval_suite.py --fail-under 0.9
echo $?   # 0 = passed, 1 = below threshold

Text output format

Suite: research-agent-suite
Running 12 cases...

  [PASS] basic_answer             (0.8s, $0.0003)
  [PASS] uses_web_search          (1.2s, $0.0005)
  [FAIL] cites_sources            (0.9s, $0.0004)
         contains("http"): not found in output
  [PASS] no_pii_in_output         (0.7s, $0.0002)
  ...

Results: 10/12 passed (83.3%)
Total cost: $0.0041
Avg duration: 0.94s

Exit codes

Code Meaning
0 All cases passed (or pass rate >= --fail-under)
1 Some cases failed or pass rate below threshold
2 Suite file not found or invalid

grampus memory

Inspect and manage agent memory.

grampus memory COMMAND [OPTIONS] AGENT_ID

grampus memory inspect

Show stored memories for an agent.

grampus memory inspect [OPTIONS] AGENT_ID
Option Default Description
AGENT_ID (required) Agent identifier
--session TEXT None Filter to a specific session ID
--type TEXT "all" Memory type: episodic, semantic, all
# All memories for an agent
grampus memory inspect research-agent

# Episodic memories for a specific session
grampus memory inspect research-agent --session session-42 --type episodic

# Semantic facts
grampus memory inspect research-agent --type semantic

Output:

Agent: research-agent
Type: episodic  Session: session-42

  [2025-01-15 12:34] (trust=0.60, importance=0.72)
  "User asked about pricing for enterprise plan."

  [2025-01-15 12:36] (trust=0.70, importance=0.45)
  "Research on agentic AI frameworks completed."

2 episodic records found.

grampus memory clear

Delete stored memories.

grampus memory clear [OPTIONS] AGENT_ID
Option Default Description
AGENT_ID (required) Agent identifier
--session TEXT None Limit deletion to a specific session
--type TEXT "all" Memory type: episodic, semantic, all
--yes False Skip confirmation prompt
# Clear all memories (with confirmation)
grampus memory clear research-agent

# Clear episodic memories for one session (no confirmation)
grampus memory clear research-agent --session session-42 --type episodic --yes

grampus memory stats

Show summary statistics.

grampus memory stats AGENT_ID

Output:

Agent: research-agent

  Episodic records:  147
  Semantic facts:     32
  Oldest record:  2025-01-10 09:15
  Newest record:  2025-01-15 14:22
  Avg trust score:   0.68

Exit codes

Code Meaning
0 Success
1 Agent not found or Dapr unavailable

grampus cost

Show cost summary for recent agent runs.

grampus cost [OPTIONS]
Option Default Description
--agent TEXT None Filter by agent ID
--session TEXT None Filter by session ID
--last INT 20 Show last N cost events
--log-file TEXT ".grampus/cost_log.jsonl" Path to JSONL cost log

Examples

# Show last 20 cost events
grampus cost

# Show costs for a specific agent
grampus cost --agent research-agent --last 50

# Show costs for a specific session
grampus cost --session session-42

Output:

Cost summary (last 20 runs)

  2025-01-15 14:22  research-agent  session-42  claude-sonnet-4-6  $0.0023  2.1s
  2025-01-15 13:55  research-agent  session-41  claude-sonnet-4-6  $0.0018  1.8s
  2025-01-15 13:10  hello-agent     session-40  claude-haiku-4-5   $0.0002  0.5s
  ...

Total (20 runs): $0.0241
Avg per run:     $0.0012

Exit codes

Code Meaning
0 Success
1 Log file not found

grampus dev

Start agent in development mode with auto-reload and live cost/trace output.

grampus dev [OPTIONS]
Option Default Description
--config TEXT "grampus.yaml" Path to grampus.yaml
--port INT 8000 Agent HTTP server port

grampus dev validates grampus.yaml on startup and on every file change.

# Start dev mode (watches current directory)
grampus dev

# Use custom config and port
grampus dev --config staging.yaml --port 8001

Exit codes

Code Meaning
0 Clean exit (Ctrl+C)
1 Config validation failed or Dapr unavailable

grampus state

Manage agent state snapshots — export, inspect, and restore full session state.

grampus state COMMAND [OPTIONS]

grampus state export

Export the state of an agent session to a portable JSON snapshot.

grampus state export [OPTIONS] AGENT_ID
Argument/Option Default Description
AGENT_ID (required) Agent identifier
--session TEXT (latest session) Session ID to export
--output TEXT "<agent_id>_<session_id>.json" Output file path
--tag KEY=VALUE (repeatable) Metadata tag attached to the snapshot
# Export the latest session for research-bot
grampus state export research-bot --output snapshot.json

# Export a specific session with tags
grampus state export research-bot \
  --session ses_abc123 \
  --output snapshot.json \
  --tag env=production \
  --tag reason=incident-review

grampus state import

Restore a previously exported snapshot into Dapr state.

grampus state import [OPTIONS] FILE
Argument/Option Default Description
FILE (required) Path to the snapshot JSON file
--dry-run False Print what would be restored without writing any state
# Preview changes without writing
grampus state import snapshot.json --dry-run

# Restore the snapshot
grampus state import snapshot.json

grampus state show

Inspect a snapshot file without restoring it.

grampus state show [OPTIONS] [FILE]
Argument/Option Default Description
FILE None Path to the snapshot JSON file (stdin if omitted)
--format TEXT "table" Output format: table, json
# Human-readable summary
grampus state show snapshot.json --format table

# Full JSON dump
grampus state show snapshot.json --format json

Exit codes

Code Meaning
0 Success
1 File not found, invalid snapshot format, or Dapr unavailable
2 Session not found for the given agent

grampus alerts

Manage cost alert rules and notification channels.

grampus alerts COMMAND [OPTIONS]

grampus alerts list

Show all configured alert rules.

grampus alerts list

Output:

ID           NAME                THRESHOLD        TYPE              SEVERITY   ENABLED
rule_abc123  session-budget      $0.10            per_session_usd   warning    yes
rule_def456  daily-spend         $5.00            per_day_usd       critical   yes
rule_ghi789  per-run-spike       $0.25            per_run_usd       warning    no

grampus alerts add

Create a new alert rule.

grampus alerts add [OPTIONS]
Option Default Description
--name TEXT (required) Unique rule name
--threshold-usd FLOAT (required) USD threshold that triggers the alert
--threshold-type TEXT (required) per_run_usd, per_session_usd, per_hour_usd, per_day_usd, per_month_usd
--severity TEXT "warning" Alert severity: info, warning, critical
--agent-id TEXT None Scope to a specific agent (None = all agents)
--cooldown INT 3600 Minimum seconds between repeated fires for this rule
grampus alerts add \
  --name "daily-spend" \
  --threshold-usd 5.00 \
  --threshold-type per_day_usd \
  --severity critical \
  --agent-id research-bot \
  --cooldown 86400

grampus alerts remove

Delete an alert rule by ID.

grampus alerts remove RULE_ID
grampus alerts remove rule_abc123

grampus alerts enable / disable

Enable or disable a rule without deleting it.

grampus alerts enable  RULE_ID
grampus alerts disable RULE_ID
grampus alerts disable rule_ghi789   # pause a noisy rule temporarily
grampus alerts enable  rule_ghi789   # re-enable it

grampus alerts test

Fire a test notification for a rule to verify your notification channels are working.

grampus alerts test RULE_ID
grampus alerts test rule_abc123
# Sends a test alert to all configured notification channels
# Prints: "Test alert sent to 2 channels (slack, log)"

Exit codes

Code Meaning
0 Success
1 Rule not found or server unavailable
2 Invalid option value

grampus playground

Interactive prompt playground for testing and comparing LLM responses.

grampus playground COMMAND [OPTIONS]

grampus playground start

Launch the interactive REPL.

grampus playground start [OPTIONS]
Option Default Description
--model TEXT "claude-haiku-4-5" Starting model
--system TEXT None System prompt string
--system-file PATH None Load system prompt from a file
--load TEXT None Resume a previously saved session by name
# Start with defaults
grampus playground start

# Start with a specific model and system prompt
grampus playground start --model gpt-4o-mini --system "You are a Python tutor."

# Resume a saved session
grampus playground start --load python-tutor

Inside the REPL, use /help to list all available commands.

grampus playground run

Run a single prompt and exit (non-interactive).

grampus playground run [OPTIONS] MESSAGE
Argument/Option Default Description
MESSAGE (required) The user message to send
--model TEXT "claude-haiku-4-5" Model to use
--system TEXT None System prompt
--no-stream False Disable streaming output
grampus playground run "What is the capital of France?" --model claude-haiku-4-5
grampus playground run "Explain recursion." --model gpt-4o-mini --no-stream

grampus playground compare

Run the same message against multiple models simultaneously.

grampus playground compare [OPTIONS] MESSAGE
Argument/Option Default Description
MESSAGE (required) The user message to send to all models
--models TEXT (required) Comma-separated list of model names
--system TEXT None System prompt applied to all models
grampus playground compare "Explain async/await." \
  --models claude-haiku-4-5,gpt-4o-mini,llama3.2

grampus playground sessions

List all saved playground sessions.

grampus playground sessions

Output:

NAME             MODEL              TURNS  COST      SAVED
python-tutor     claude-haiku-4-5   8      $0.0012   2026-06-01 14:22
billing-tests    gpt-4o-mini        3      $0.0003   2026-05-30 09:15

grampus playground show

Display the contents of a saved session.

grampus playground show [OPTIONS] NAME
Option Default Description
NAME (required) Saved session name
--format TEXT "transcript" Output format: transcript, json
grampus playground show python-tutor
grampus playground show python-tutor --format json

Exit codes

Code Meaning
0 Success or clean REPL exit
1 Model provider not configured or session not found

grampus redteam

Run an adversarial red-team campaign against an agent to find security vulnerabilities before attackers do.

grampus redteam [OPTIONS] AGENT_FILE
Argument/Option Default Description
AGENT_FILE (required) Path to agent adapter Python file
--categories/-c TEXT all Attack categories to run (repeatable): prompt_injection, jailbreak, reasoning_hijack, memory_poison, tool_misuse, excessive_agency
--count/-n INT 5 Number of payloads per strategy
--output/-o TEXT "text" Report format: text, json
--stop-on-critical False Halt campaign immediately on first CRITICAL finding
--model TEXT None Model ID for LLM-based judge + adaptive mutation (e.g. claude-sonnet-4-6)

The agent file must expose two functions:

  • get_agent_config() -> RedTeamTargetConfig — agent metadata and capability flags
  • async run_conversation(messages: list[tuple[str, str]]) -> str — stateless or stateful conversation handler

Examples

# Full campaign, all categories, text output
grampus redteam agents/my_agent.py

# Specific categories only
grampus redteam agents/my_agent.py --categories prompt_injection jailbreak

# Fast CI scan: 3 payloads per strategy, stop on CRITICAL
grampus redteam agents/my_agent.py --stop-on-critical --count 3

# Thorough pre-release audit with LLM judge
grampus redteam agents/my_agent.py --model claude-sonnet-4-6 --count 10

# JSON output for downstream processing
grampus redteam agents/my_agent.py --output json > redteam-report.json

Attack categories

Category OWASP What it tests
prompt_injection ASI01:2026 Direct and indirect instruction overrides
jailbreak ASI01:2026 Roleplay frames, encoding tricks, logic traps
reasoning_hijack ASI01:2026 Multi-turn context manipulation
memory_poison ASI06:2026 Persistent memory write injection
tool_misuse ASI02:2026 Infinite loops, chain escapes, enumeration
excessive_agency LLM #2 Scope escalation, implicit permission exploits

Report format

SUMMARY
  Total attacks:     30
  Successful:        4
  Attack success:    13.3%

SEVERITY BREAKDOWN
  HIGH       3
  MEDIUM     1

FINDINGS
  [HIGH] Prompt Injection — Direct Injection
    Category:    prompt_injection
    OWASP:       ASI01:2026
    Occurrences: 3
    Recommendation: Raise PromptInjectionDetector to STRICT...

Exit codes

Code Meaning
0 No CRITICAL or HIGH findings
1 One or more CRITICAL or HIGH findings — suitable for CI gates
2 Agent file missing required functions or invalid category