Model Providers¶
Grampus supports multiple LLM providers through a unified ModelClient interface — switch providers by changing one line of configuration without touching the rest of your agent code.
Supported providers¶
| Provider | Class | Extra | Example models |
|---|---|---|---|
| Anthropic | AnthropicClient |
grampus-ai[anthropic] |
claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5 |
| OpenAI | OpenAIClient |
grampus-ai[openai] |
gpt-4o, gpt-4o-mini, o1, o3 |
| Google Gemini | GeminiClient |
grampus-ai[gemini] |
gemini-2.0-flash, gemini-1.5-pro |
| Cohere | CohereClient |
grampus-ai[cohere] |
command-a-03-2025, command-r-plus-08-2024, command-r-08-2024, command-r7b-12-2024 |
| Ollama (local) | OllamaClient |
grampus-ai[ollama] |
llama3.2, mistral, qwen2.5, phi4, deepseek-r1, any pulled model |
Using each provider¶
Anthropic¶
import os
from grampus.core.models.anthropic import AnthropicClient
client = AnthropicClient(api_key=os.environ["GRAMPUS_MODEL__ANTHROPIC_API_KEY"])
Environment variable: GRAMPUS_MODEL__ANTHROPIC_API_KEY
OpenAI¶
import os
from grampus.core.models.openai import OpenAIClient
client = OpenAIClient(api_key=os.environ["GRAMPUS_MODEL__OPENAI_API_KEY"])
Environment variable: GRAMPUS_MODEL__OPENAI_API_KEY
Google Gemini¶
import os
from grampus.core.models.gemini import GeminiClient
client = GeminiClient(api_key=os.environ["GRAMPUS_MODEL__GEMINI_API_KEY"])
Environment variable: GRAMPUS_MODEL__GEMINI_API_KEY
Cohere¶
import os
from grampus.core.models.cohere import CohereClient
client = CohereClient(api_key=os.environ["GRAMPUS_MODEL__COHERE_API_KEY"])
Environment variable: GRAMPUS_MODEL__COHERE_API_KEY
Available models:
| Model | Context | Pricing (input / output per 1M tokens) | Best for |
|---|---|---|---|
command-a-03-2025 |
256K | $2.50 / $10.00 | Flagship — agentic tasks, tool use |
command-r-plus-08-2024 |
128K | $2.50 / $10.00 | RAG, long-context reasoning |
command-r-08-2024 |
128K | $0.15 / $0.60 | Balanced cost/quality |
command-r7b-12-2024 |
128K | $0.0375 / $0.15 | High-throughput, budget-constrained |
Cohere SDK version
Requires Cohere Python SDK v5.1.8+. Grampus uses the v2 client (AsyncClientV2) which accepts the same OpenAI-compatible message format, including tool calls.
Ollama (local models)¶
Ollama lets you run open-weight models locally with no API cost or data leaving your machine.
Step 1 — Install Ollama:
Step 2 — Start the Ollama server:
Step 3 — Pull a model:
Step 4 — Use with Grampus:
from grampus.core.models.ollama import OllamaClient
# Default: connects to http://localhost:11434
client = OllamaClient(host="http://localhost:11434")
Token usage with Ollama
Ollama models have zero API cost. Token usage is still tracked for context window management and working memory summarization triggers.
Using providers with AgentRunner¶
Pass a client directly to AgentRunner, or set model in AgentDefinition — the ModelRouter resolves the client from the configured providers:
import asyncio
import os
from grampus.core.models.gemini import GeminiClient
from grampus.core.types import AgentDefinition
from grampus.orchestration.runner import AgentRunner, RunnerConfig
from grampus.tools.executor import ToolExecutor
from grampus.tools.registry import ToolRegistry
async def main() -> None:
client = GeminiClient(api_key=os.environ["GRAMPUS_MODEL__GEMINI_API_KEY"])
registry = ToolRegistry()
executor = ToolExecutor(registry)
config = RunnerConfig(max_iterations=5, enable_memory=False)
runner = AgentRunner(model_client=client, tool_executor=executor, config=config)
agent_def = AgentDefinition(
name="gemini-agent",
model="gemini-2.0-flash",
system_prompt="You are a helpful assistant.",
)
result = await runner.run(agent_def, "What is the capital of Japan?")
print(result.output)
asyncio.run(main())
Switching from Gemini to Ollama is one-line:
from grampus.core.models.ollama import OllamaClient
client = OllamaClient(host="http://localhost:11434")
agent_def = AgentDefinition(
name="ollama-agent",
model="llama3.2",
system_prompt="You are a helpful assistant.",
)
Model router¶
For production deployments, the ModelRouter automatically selects the cheapest model capable of handling each step, with fallback on failure. Models are grouped into tiers:
| Tier | Example models | Use case |
|---|---|---|
fast |
claude-haiku-4-5, gemini-2.0-flash, command-r7b-12-2024, llama3.2 |
Simple reasoning, tool arg generation |
balanced |
claude-sonnet-4-6, gpt-4o-mini, command-r-08-2024, qwen2.5 |
Most tasks |
powerful |
claude-opus-4-7, gpt-4o, command-a-03-2025, o1 |
Complex reasoning, synthesis |
Configure routing in grampus.yaml:
model:
default_model: claude-sonnet-4-6
router:
enabled: true
fast: claude-haiku-4-5
balanced: claude-sonnet-4-6
powerful: claude-opus-4-7
See the Observability guide for tracking cost per model tier.
See also¶
- Prompt Playground → — Test prompts across multiple providers interactively
- Cost Management → — Track and alert on per-model spending
- Configuration reference → — Full
ModelConfigfield reference