Quickstart¶
Build and run your first Grampus agent in 5 minutes.
What you'll build¶
A simple conversational agent powered by Claude that answers questions and reports the cost of each interaction.
Step 1 — Scaffold the project¶
This creates the following structure:
hello-agent/
├── agent.py # Your agent code
├── grampus.yaml # Project configuration
├── docker-compose.yml # Local infrastructure
├── dapr/
│ └── components/ # Dapr component YAML files
└── .gitignore
Step 2 — Review the configuration¶
Open grampus.yaml:
# grampus.yaml — project configuration
model:
default_model: claude-sonnet-4-6 # LLM to use by default
temperature: 0.0 # Deterministic output
max_tokens: 4096 # Max tokens per response
memory:
working_memory_token_limit: 100000 # Context window size
summarization_strategy: hybrid # truncate | summarize | hybrid
episodic_top_k: 5 # Memories recalled per query
safety:
injection_detection_level: balanced # strict | balanced | permissive
pii_detection_enabled: true # Redact PII in outputs
dapr:
host: localhost
port: 3500 # Dapr HTTP sidecar port
observability:
otel_enabled: true
log_level: INFO
Step 3 — Write the agent¶
Replace the contents of agent.py with:
import asyncio
import os
from grampus.core.models.anthropic import AnthropicClient
from grampus.core.types import AgentDefinition
from grampus.orchestration.runner import AgentRunner, RunnerConfig
from grampus.tools.executor import ToolExecutor
from grampus.tools.registry import ToolRegistry
def create_runner() -> AgentRunner:
"""Factory function called by `grampus run`."""
client = AnthropicClient(api_key=os.environ["GRAMPUS_MODEL__ANTHROPIC_API_KEY"])
registry = ToolRegistry()
executor = ToolExecutor(registry)
config = RunnerConfig(max_iterations=5, enable_memory=False)
return AgentRunner(model_client=client, tool_executor=executor, config=config)
def create_agent_def() -> AgentDefinition:
"""Agent blueprint called by `grampus run`."""
return AgentDefinition(
name="hello",
model="claude-sonnet-4-6",
system_prompt="You are a helpful assistant. Answer concisely.",
)
async def main() -> None:
runner = create_runner()
agent = create_agent_def()
result = await runner.run(agent, "What is 2 + 2?", session_id="quickstart-1")
print(f"Output: {result.output}")
print(f"Steps: {result.steps_taken}")
print(f"Tokens: {result.token_usage.total_tokens}")
print(f"Cost: ${result.token_usage.cost_usd:.6f}")
if __name__ == "__main__":
asyncio.run(main())
Step 4 — Start infrastructure¶
Step 5 — Run the agent¶
Expected output:
What just happened?¶
sequenceDiagram
participant You
participant Runner as AgentRunner
participant LLM as Claude API
participant Cost as CostTracker
You->>Runner: run(agent, "What is 2+2?")
Runner->>LLM: messages=[system, user]
LLM-->>Runner: "2 + 2 equals 4." (no tool calls)
Runner->>Cost: record token usage
Runner-->>You: ExecutionResult(output="2 + 2 equals 4.", ...)
The AgentRunner ran one iteration of the ReAct loop: it sent your input to the LLM, received a final answer (no tool calls needed), and returned the ExecutionResult.
Next steps¶
- Single-agent guide → — Add tools, memory, and safety to build a research agent
- Memory guide → — Persist knowledge across sessions
- CLI reference → — All
grampuscommands and flags