Skip to content

Model Providers

Grampus supports multiple LLM providers through a unified ModelClient interface — switch providers by changing one line of configuration without touching the rest of your agent code.


Supported providers

Provider Class Extra Example models
Anthropic AnthropicClient grampus-ai[anthropic] claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5
OpenAI OpenAIClient grampus-ai[openai] gpt-4o, gpt-4o-mini, o1, o3
Google Gemini GeminiClient grampus-ai[gemini] gemini-2.0-flash, gemini-1.5-pro
Cohere CohereClient grampus-ai[cohere] command-a-03-2025, command-r-plus-08-2024, command-r-08-2024, command-r7b-12-2024
Ollama (local) OllamaClient grampus-ai[ollama] llama3.2, mistral, qwen2.5, phi4, deepseek-r1, any pulled model

Using each provider

Anthropic

pip install "grampus-ai[anthropic]"
import os

from grampus.core.models.anthropic import AnthropicClient

client = AnthropicClient(api_key=os.environ["GRAMPUS_MODEL__ANTHROPIC_API_KEY"])

Environment variable: GRAMPUS_MODEL__ANTHROPIC_API_KEY


OpenAI

pip install "grampus-ai[openai]"
import os

from grampus.core.models.openai import OpenAIClient

client = OpenAIClient(api_key=os.environ["GRAMPUS_MODEL__OPENAI_API_KEY"])

Environment variable: GRAMPUS_MODEL__OPENAI_API_KEY


Google Gemini

pip install "grampus-ai[gemini]"
import os

from grampus.core.models.gemini import GeminiClient

client = GeminiClient(api_key=os.environ["GRAMPUS_MODEL__GEMINI_API_KEY"])

Environment variable: GRAMPUS_MODEL__GEMINI_API_KEY


Cohere

pip install "grampus-ai[cohere]"
import os

from grampus.core.models.cohere import CohereClient

client = CohereClient(api_key=os.environ["GRAMPUS_MODEL__COHERE_API_KEY"])

Environment variable: GRAMPUS_MODEL__COHERE_API_KEY

Available models:

Model Context Pricing (input / output per 1M tokens) Best for
command-a-03-2025 256K $2.50 / $10.00 Flagship — agentic tasks, tool use
command-r-plus-08-2024 128K $2.50 / $10.00 RAG, long-context reasoning
command-r-08-2024 128K $0.15 / $0.60 Balanced cost/quality
command-r7b-12-2024 128K $0.0375 / $0.15 High-throughput, budget-constrained

Cohere SDK version

Requires Cohere Python SDK v5.1.8+. Grampus uses the v2 client (AsyncClientV2) which accepts the same OpenAI-compatible message format, including tool calls.


Ollama (local models)

Ollama lets you run open-weight models locally with no API cost or data leaving your machine.

pip install "grampus-ai[ollama]"

Step 1 — Install Ollama:

brew install ollama
curl -fsSL https://ollama.com/install.sh | sh

Step 2 — Start the Ollama server:

ollama serve

Step 3 — Pull a model:

ollama pull llama3.2
# or
ollama pull mistral
# or
ollama pull qwen2.5

Step 4 — Use with Grampus:

from grampus.core.models.ollama import OllamaClient

# Default: connects to http://localhost:11434
client = OllamaClient(host="http://localhost:11434")

Token usage with Ollama

Ollama models have zero API cost. Token usage is still tracked for context window management and working memory summarization triggers.


Using providers with AgentRunner

Pass a client directly to AgentRunner, or set model in AgentDefinition — the ModelRouter resolves the client from the configured providers:

import asyncio
import os

from grampus.core.models.gemini import GeminiClient
from grampus.core.types import AgentDefinition
from grampus.orchestration.runner import AgentRunner, RunnerConfig
from grampus.tools.executor import ToolExecutor
from grampus.tools.registry import ToolRegistry


async def main() -> None:
    client = GeminiClient(api_key=os.environ["GRAMPUS_MODEL__GEMINI_API_KEY"])
    registry = ToolRegistry()
    executor = ToolExecutor(registry)
    config = RunnerConfig(max_iterations=5, enable_memory=False)

    runner = AgentRunner(model_client=client, tool_executor=executor, config=config)
    agent_def = AgentDefinition(
        name="gemini-agent",
        model="gemini-2.0-flash",
        system_prompt="You are a helpful assistant.",
    )

    result = await runner.run(agent_def, "What is the capital of Japan?")
    print(result.output)


asyncio.run(main())

Switching from Gemini to Ollama is one-line:

from grampus.core.models.ollama import OllamaClient

client = OllamaClient(host="http://localhost:11434")
agent_def = AgentDefinition(
    name="ollama-agent",
    model="llama3.2",
    system_prompt="You are a helpful assistant.",
)

Model router

For production deployments, the ModelRouter automatically selects the cheapest model capable of handling each step, with fallback on failure. Models are grouped into tiers:

Tier Example models Use case
fast claude-haiku-4-5, gemini-2.0-flash, command-r7b-12-2024, llama3.2 Simple reasoning, tool arg generation
balanced claude-sonnet-4-6, gpt-4o-mini, command-r-08-2024, qwen2.5 Most tasks
powerful claude-opus-4-7, gpt-4o, command-a-03-2025, o1 Complex reasoning, synthesis

Configure routing in grampus.yaml:

model:
  default_model: claude-sonnet-4-6
  router:
    enabled: true
    fast: claude-haiku-4-5
    balanced: claude-sonnet-4-6
    powerful: claude-opus-4-7

See the Observability guide for tracking cost per model tier.


See also