Skip to content

Observability API Reference

GrampusTracer

Wraps the OpenTelemetry SDK with agent-specific span types.

grampus.observability.tracer.GrampusTracer

Wraps the OpenTelemetry SDK to produce Grampus-specific agent spans.

All span methods are synchronous context managers:

with tracer.agent_run(session_id="x") as span:
    span.set_attribute("custom", "value")

Parameters:

Name Type Description Default
service_name str

OTEL service name (e.g. "grampus-agent").

'grampus-agent'
otlp_endpoint str | None

Optional OTLP exporter endpoint (e.g. "http://localhost:4317"). When None, uses a NoOpTracerProvider — no network calls.

None
agent_id str

Default agent_id attached to every span.

'unknown'

llm_call(*, model, input_tokens=0, output_tokens=0, cost_usd=0.0, **extra_attrs)

Span for one LLM completion call.

Parameters:

Name Type Description Default
model str

Model identifier string.

required
input_tokens int

Number of prompt tokens consumed.

0
output_tokens int

Number of completion tokens produced.

0
cost_usd float

Estimated cost in USD.

0.0
**extra_attrs Any

Additional span attributes.

{}

Yields:

Type Description
Span

The active OTEL Span.

tool_call(*, tool_name, success=True, duration_ms=0.0, **extra_attrs)

Span for one tool execution.

Parameters:

Name Type Description Default
tool_name str

Registered name of the tool.

required
success bool

Whether the tool call succeeded.

True
duration_ms float

Wall-clock execution time.

0.0
**extra_attrs Any

Additional span attributes.

{}

Yields:

Type Description
Span

The active OTEL Span.

memory_read(*, memory_type, records_returned=0)

Span for a memory recall operation.

Parameters:

Name Type Description Default
memory_type str

One of "working", "episodic", "semantic", "procedural".

required
records_returned int

Number of records surfaced by the query.

0

Yields:

Type Description
Span

The active OTEL Span.

memory_write(*, memory_type, content_length=0)

Span for a memory store operation.

Parameters:

Name Type Description Default
memory_type str

Memory layer being written.

required
content_length int

Byte length of the content stored.

0

Yields:

Type Description
Span

The active OTEL Span.

record_llm_call(span, *, model, input_tokens=0, output_tokens=0, cost_usd=0.0, latency_ms=0.0, **extra_attrs)

Record LLM call attributes on an existing span.

Parameters:

Name Type Description Default
span Span

The span to annotate.

required
model str

Model identifier string.

required
input_tokens int

Number of prompt tokens consumed.

0
output_tokens int

Number of completion tokens produced.

0
cost_usd float

Estimated cost in USD.

0.0
latency_ms float

Call latency in milliseconds.

0.0
**extra_attrs Any

Additional span attributes.

{}

Span context manager

tracer = GrampusTracer(service_name="my-agent", otel_endpoint="http://localhost:4317")

with tracer.span("agent.custom_step", attributes={"step.name": "validate"}):
    do_work()

# Async
async with tracer.async_span("agent.llm_call", attributes={"model": "claude-sonnet-4-6"}):
    response = await llm.complete(messages)

Span types and attributes

Span type Key attributes
agent.run agent.name, agent.model, session.id, agent.status
agent.llm_call model, input_tokens, output_tokens, cost_usd, stop_reason
agent.tool_call tool.name, tool.duration_ms, tool.success, tool.call_id
agent.memory_read memory.type, memory.query, memory.results_count
agent.memory_write memory.type, memory.source_type, memory.trust_level
agent.decision agent.step, decision.action

GrampusMetrics

Prometheus-compatible metrics endpoint.

grampus.observability.metrics.GrampusMetrics

In-process metrics collector with Prometheus-compatible text exposition.

Does NOT require a running Prometheus server — stores everything in memory and exports to Prometheus text format on demand.

Parameters:

Name Type Description Default
agent_id str

Scopes per-agent metrics.

required

record_llm_call(*, model, input_tokens, output_tokens, cost_usd, latency_ms)

Increment token/cost/call counters. Record latency in histogram.

Parameters:

Name Type Description Default
model str

Model identifier.

required
input_tokens int

Prompt token count.

required
output_tokens int

Completion token count.

required
cost_usd float

Estimated USD cost.

required
latency_ms float

Round-trip latency in milliseconds.

required

record_tool_call(*, tool_name, success, latency_ms)

Increment tool call counter. Record latency in histogram.

Parameters:

Name Type Description Default
tool_name str

Name of the invoked tool.

required
success bool

Whether execution succeeded.

required
latency_ms float

Execution time in milliseconds.

required

record_error(*, error_type)

Increment error counter.

Parameters:

Name Type Description Default
error_type str

Short class name of the error.

required

set_active_agents(count)

Update active agent gauge.

Parameters:

Name Type Description Default
count int

Current number of concurrently running agents.

required

to_prometheus_text()

Export metrics in Prometheus text exposition format.

Returns:

Type Description
str

Multiline string with # HELP, # TYPE, and metric lines.

snapshot()

Return current accumulated metrics. Pure computation, no I/O.

Counter metrics

Metric name Labels Description
grampus_total_tokens model, agent_name Tokens consumed
grampus_total_cost_usd model, agent_name USD spent
grampus_total_tool_calls tool_name, agent_name Tool executions
grampus_total_errors error_code, agent_name Errors by type
grampus_llm_call_count model, agent_name Total LLM calls made

Gauge metrics

Metric name Labels Description
grampus_active_agents agent_name Currently running agents

Histogram metrics

Metric name Labels Description
grampus_llm_latency_ms model, agent_name LLM call latency in milliseconds
grampus_tool_latency_ms tool_name, agent_name Tool execution latency in milliseconds

EventLog

Append-only audit log for every agent action.

grampus.observability.events.EventLog

Append-only, replayable log of agent events.

Backed by Dapr state store when configured; falls back to in-memory list when state_store is None (useful for testing).

Events are immutable once written. No update or delete operations.

Parameters:

Name Type Description Default
agent_id str

Scopes the log to this agent.

required
session_id str

Current session.

required
state_store Any | None

Optional Dapr state store for persistence.

None

append(event_type, payload=None) async

Create and store an AgentEvent. Returns the stored event.

Parameters:

Name Type Description Default
event_type EventType

The type of agent action being recorded.

required
payload dict[str, Any] | None

Arbitrary metadata about the event.

None

Returns:

Type Description
AgentEvent

The persisted AgentEvent with an auto-assigned sequence number.

replay() async

Return all events for this agent/session in sequence order.

Returns:

Type Description
list[AgentEvent]

Ordered list of AgentEvent records from sequence 0 onward.

replay_since(sequence_number) async

Return events with sequence_number >= the given value.

Parameters:

Name Type Description Default
sequence_number int

Inclusive lower bound on sequence number.

required

Returns:

Type Description
list[AgentEvent]

Filtered, ordered list of AgentEvent records.

event_count()

Return the number of events appended in this instance's lifetime.

AgentEvent

@dataclass
class AgentEvent:
    event_id: str
    session_id: str
    agent_name: str
    event_type: str          # see event types table below
    summary: str             # human-readable one-line description
    payload: dict[str, Any]  # full event data
    timestamp: datetime
    step: int                # ReAct iteration number

Event types

Event type Payload keys
agent.started agent_name, model, input
agent.completed steps_taken, cost_usd, output_preview
agent.failed error_code, error_message
llm.called model, message_count, input_tokens
llm.responded output_tokens, cost_usd, stop_reason
tool.called tool_name, arguments
tool.completed duration_ms, output_preview
tool.failed error_code, error_message
memory.read query, types, results_count
memory.written memory_type, source_type, trust_level
safety.violation violation_type, severity, blocked

BehaviorMonitor

Tracks agent behavior patterns and detects anomalies.

grampus.observability.behavior.BehaviorMonitor

Tracks per-agent behavioral patterns and detects anomalies.

Maintains a rolling window of turn-level observations. After each turn is recorded, checks for anomalies against the baseline.

Parameters:

Name Type Description Default
agent_id str

Agent being monitored.

required
cost_spike_threshold float

Multiplier above avg_cost triggering COST_SPIKE.

3.0
error_spike_threshold float

Multiplier above avg_errors for ERROR_RATE_SPIKE.

5.0
tool_shift_threshold float

Fraction of new tools triggering TOOL_USAGE_SHIFT.

0.5

record_turn(*, cost_usd, tool_names, error_count)

Record one agent turn and return any anomalies detected.

Parameters:

Name Type Description Default
cost_usd float

Total cost incurred this turn.

required
tool_names list[str]

Tools invoked during this turn.

required
error_count int

Number of errors encountered.

required

Returns:

Type Description
list[Anomaly]

List of new Anomaly objects (empty if baseline not yet established).

anomalies()

Return the most recent anomalies (up to 1 000) detected so far.

profile()

Return the current behavioral profile (snapshot).

Returns:

Type Description
AgentBehaviorProfile

AgentBehaviorProfile with rolling-window statistics.

BehaviorAnomaly

@dataclass
class BehaviorAnomaly:
    pattern: str              # "tool_usage_shift" | "cost_spike" | etc.
    severity: str             # "warning" | "critical"
    description: str          # human-readable explanation
    current_value: float      # observed metric value
    baseline_value: float     # expected (rolling average) value
    ratio: float              # current / baseline

Monitored anomaly patterns

Pattern Trigger condition
tool_usage_shift Tool X called > 2.5× or < 0.4× baseline frequency
cost_spike Cost per run > 2.5× rolling average
memory_access_anomaly Memory reads from unusual source types
error_rate_spike Error rate > 2.5× baseline
latency_spike P95 run duration > 2.5× baseline