Model Providers¶

Grampus supports multiple LLM providers through a unified ModelClient interface — switch providers by changing one line of configuration without touching the rest of your agent code.

Supported providers¶

Provider	Class	Extra	Example models
Anthropic	`AnthropicClient`	`grampus-ai[anthropic]`	`claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5`
OpenAI	`OpenAIClient`	`grampus-ai[openai]`	`gpt-4o`, `gpt-4o-mini`, `o1`, `o3`
Google Gemini	`GeminiClient`	`grampus-ai[gemini]`	`gemini-2.0-flash`, `gemini-1.5-pro`
Cohere	`CohereClient`	`grampus-ai[cohere]`	`command-a-03-2025`, `command-r-plus-08-2024`, `command-r-08-2024`, `command-r7b-12-2024`
Ollama (local)	`OllamaClient`	`grampus-ai[ollama]`	`llama3.2`, `mistral`, `qwen2.5`, `phi4`, `deepseek-r1`, any pulled model

Using each provider¶

Anthropic¶

pip install "grampus-ai[anthropic]"

import os

from grampus.core.models.anthropic import AnthropicClient

client = AnthropicClient(api_key=os.environ["GRAMPUS_MODEL__ANTHROPIC_API_KEY"])

Environment variable: GRAMPUS_MODEL__ANTHROPIC_API_KEY

OpenAI¶

pip install "grampus-ai[openai]"

import os

from grampus.core.models.openai import OpenAIClient

client = OpenAIClient(api_key=os.environ["GRAMPUS_MODEL__OPENAI_API_KEY"])

Environment variable: GRAMPUS_MODEL__OPENAI_API_KEY

Google Gemini¶

pip install "grampus-ai[gemini]"

import os

from grampus.core.models.gemini import GeminiClient

client = GeminiClient(api_key=os.environ["GRAMPUS_MODEL__GEMINI_API_KEY"])

Environment variable: GRAMPUS_MODEL__GEMINI_API_KEY

Cohere¶

pip install "grampus-ai[cohere]"

import os

from grampus.core.models.cohere import CohereClient

client = CohereClient(api_key=os.environ["GRAMPUS_MODEL__COHERE_API_KEY"])

Environment variable: GRAMPUS_MODEL__COHERE_API_KEY

Available models:

Model	Context	Pricing (input / output per 1M tokens)	Best for
`command-a-03-2025`	256K	$2.50 / $10.00	Flagship — agentic tasks, tool use
`command-r-plus-08-2024`	128K	$2.50 / $10.00	RAG, long-context reasoning
`command-r-08-2024`	128K	$0.15 / $0.60	Balanced cost/quality
`command-r7b-12-2024`	128K	$0.0375 / $0.15	High-throughput, budget-constrained

Cohere SDK version

Requires Cohere Python SDK v5.1.8+. Grampus uses the v2 client (AsyncClientV2) which accepts the same OpenAI-compatible message format, including tool calls.

Ollama (local models)¶

Ollama lets you run open-weight models locally with no API cost or data leaving your machine.

pip install "grampus-ai[ollama]"

Step 1 — Install Ollama:

macOSLinux

brew install ollama

curl -fsSL https://ollama.com/install.sh | sh

Step 2 — Start the Ollama server:

ollama serve

Step 3 — Pull a model:

ollama pull llama3.2
# or
ollama pull mistral
# or
ollama pull qwen2.5

Step 4 — Use with Grampus:

from grampus.core.models.ollama import OllamaClient

# Default: connects to http://localhost:11434
client = OllamaClient(host="http://localhost:11434")

Token usage with Ollama

Ollama models have zero API cost. Token usage is still tracked for context window management and working memory summarization triggers.

Using providers with AgentRunner¶

Pass a client directly to AgentRunner, or set model in AgentDefinition — the ModelRouter resolves the client from the configured providers:

import asyncio
import os

from grampus.core.models.gemini import GeminiClient
from grampus.core.types import AgentDefinition
from grampus.orchestration.runner import AgentRunner, RunnerConfig
from grampus.tools.executor import ToolExecutor
from grampus.tools.registry import ToolRegistry


async def main() -> None:
    client = GeminiClient(api_key=os.environ["GRAMPUS_MODEL__GEMINI_API_KEY"])
    registry = ToolRegistry()
    executor = ToolExecutor(registry)
    config = RunnerConfig(max_iterations=5, enable_memory=False)

    runner = AgentRunner(model_client=client, tool_executor=executor, config=config)
    agent_def = AgentDefinition(
        name="gemini-agent",
        model="gemini-2.0-flash",
        system_prompt="You are a helpful assistant.",
    )

    result = await runner.run(agent_def, "What is the capital of Japan?")
    print(result.output)


asyncio.run(main())

Switching from Gemini to Ollama is one-line:

from grampus.core.models.ollama import OllamaClient

client = OllamaClient(host="http://localhost:11434")
agent_def = AgentDefinition(
    name="ollama-agent",
    model="llama3.2",
    system_prompt="You are a helpful assistant.",
)

Model router¶

For production deployments, the ModelRouter automatically selects the cheapest model capable of handling each step, with fallback on failure. Models are grouped into tiers:

Tier	Example models	Use case
`fast`	`claude-haiku-4-5`, `gemini-2.0-flash`, `command-r7b-12-2024`, `llama3.2`	Simple reasoning, tool arg generation
`balanced`	`claude-sonnet-4-6`, `gpt-4o-mini`, `command-r-08-2024`, `qwen2.5`	Most tasks
`powerful`	`claude-opus-4-7`, `gpt-4o`, `command-a-03-2025`, `o1`	Complex reasoning, synthesis

Configure routing in grampus.yaml:

model:
  default_model: claude-sonnet-4-6
  router:
    enabled: true
    fast: claude-haiku-4-5
    balanced: claude-sonnet-4-6
    powerful: claude-opus-4-7

See the Observability guide for tracking cost per model tier.