Skip to content

Cost Management & Alerts

Grampus tracks cost at every level — individual model calls, tool steps, sessions, and agent lifetime — and can enforce hard budgets that stop an agent before it overspends. For production deployments, cost alerts fire notifications to Slack, email, or webhooks when thresholds are crossed.


Cost tracking

CostTracker records token usage and USD cost for every LLM call. The data is available immediately on ExecutionResult:

result = await runner.run(agent_def, "Research quantum computing.", session_id="s1")

print(f"Input tokens:  {result.token_usage.input_tokens}")
print(f"Output tokens: {result.token_usage.output_tokens}")
print(f"Total tokens:  {result.token_usage.total_tokens}")
print(f"Cost:          ${result.token_usage.cost_usd:.6f}")
print(f"Model:         {result.token_usage.model}")

Cost events are also published to the Dapr pub/sub bus, where the BehaviorMonitor and alert evaluator consume them.


Budget enforcement

Set cost_budget_usd on AgentDefinition to stop the agent before it exceeds a per-run limit. When the budget is reached, AgentRunner raises BudgetExceededError before the next LLM call:

from grampus.core.errors import BudgetExceededError
from grampus.core.types import AgentDefinition

agent_def = AgentDefinition(
    name="research-bot",
    model="claude-sonnet-4-6",
    system_prompt="You are a research assistant.",
    cost_budget_usd=0.50,   # hard stop at $0.50 per run
)

try:
    result = await runner.run(agent_def, "Research every aspect of quantum computing.")
except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")
    print(f"Spent so far:    ${e.details['cost_usd_so_far']:.4f}")
    print(f"Limit:           ${e.details['budget_usd']:.4f}")

Use grampus cost on the CLI to review recent spending across all agents. See the CLI reference.


Cost alerts

Cost alerts let you react to spending patterns before they become surprises. Define rules, attach notification channels, and let the AlertEvaluator watch for threshold crossings.

Defining alert rules

from grampus.observability.alerts import AlertEvaluator, AlertRule, AlertSeverity, ThresholdType
from grampus.observability.notification import (
    LogChannel,
    NotificationDispatcher,
    SlackChannel,
    SmtpChannel,
    WebhookChannel,
)

rules = [
    AlertRule(
        name="session-budget",
        threshold_type=ThresholdType.PER_SESSION_USD,
        threshold_usd=0.10,
        severity=AlertSeverity.WARNING,
        cooldown_seconds=3600,   # at most one alert per hour for this rule
    ),
    AlertRule(
        name="daily-spend-research-bot",
        agent_id="research-bot",       # None = applies to all agents
        threshold_type=ThresholdType.PER_DAY_USD,
        threshold_usd=5.00,
        severity=AlertSeverity.CRITICAL,
        cooldown_seconds=86400,
    ),
    AlertRule(
        name="per-run-spike",
        threshold_type=ThresholdType.PER_RUN_USD,
        threshold_usd=0.25,
        severity=AlertSeverity.WARNING,
    ),
]

dispatcher = NotificationDispatcher(
    channels=[
        SlackChannel(webhook_url="https://hooks.slack.com/services/..."),
        SmtpChannel(
            host="smtp.myco.com",
            port=587,
            username="alerts@myco.com",
            password="...",
            to_addrs=["ops@myco.com"],
        ),
        LogChannel(),   # always log — even if Slack/email fails
    ]
)

evaluator = AlertEvaluator(rules=rules, dispatcher=dispatcher)

Pass the evaluator to AgentRunner and it monitors every cost event automatically:

runner = AgentRunner(
    model_client=client,
    tool_executor=executor,
    config=RunnerConfig(max_iterations=10),
    alert_evaluator=evaluator,
)

Notification channels

Channel Requires Behavior
WebhookChannel url POST JSON payload; optional HMAC-SHA256 signature for verification
SlackChannel Incoming webhook URL Slack Block Kit message with severity color coding
SmtpChannel SMTP host and credentials Plain text email via stdlib smtplib — no extra dependencies
LogChannel Nothing structlog warning; always succeeds even if other channels fail

Threshold types

Type Resets Use case
PER_RUN_USD Each run Alert on single expensive runs
PER_SESSION_USD Each session Track cumulative session spend
PER_HOUR_USD Each hour (rolling) Detect rate spikes
PER_DAY_USD Midnight UTC Daily budget monitoring
PER_MONTH_USD First of month UTC Monthly billing control

Alert rules via REST API

When the Grampus server is running (grampus serve), manage alert rules over HTTP:

# Create a rule
curl -X POST http://localhost:8000/alerts/rules \
  -H "Content-Type: application/json" \
  -d '{
    "name": "session-budget",
    "threshold_type": "per_session_usd",
    "threshold_usd": 0.10,
    "severity": "warning",
    "cooldown_seconds": 3600
  }'

# List all rules
curl http://localhost:8000/alerts/rules

# Enable or disable a rule
curl -X PATCH http://localhost:8000/alerts/rules/rule_abc123 \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

# Delete a rule
curl -X DELETE http://localhost:8000/alerts/rules/rule_abc123

# View alert history (last 500 events)
curl "http://localhost:8000/alerts/history?limit=50"

Alert management via CLI

# List all configured rules
grampus alerts list

# Add a new rule interactively
grampus alerts add \
  --name "daily-spend" \
  --threshold-usd 5.00 \
  --threshold-type per_day_usd \
  --severity critical \
  --agent-id research-bot \
  --cooldown 86400

# Enable or disable a rule
grampus alerts enable  <rule_id>
grampus alerts disable <rule_id>

# Remove a rule
grampus alerts remove <rule_id>

# Fire a test notification for a rule (verify channels work)
grampus alerts test <rule_id>

See the CLI reference for all flags.


See also