Observability API Reference¶

GrampusTracer¶

Wraps the OpenTelemetry SDK with agent-specific span types.

`grampus.observability.tracer.GrampusTracer` ¶

Wraps the OpenTelemetry SDK to produce Grampus-specific agent spans.

All span methods are synchronous context managers:

with tracer.agent_run(session_id="x") as span:
    span.set_attribute("custom", "value")

Parameters:

Name	Type	Description	Default
`service_name`	`str`	OTEL service name (e.g. "grampus-agent").	`'grampus-agent'`
`otlp_endpoint`	`str \| None`	Optional OTLP exporter endpoint (e.g. "http://localhost:4317"). When None, uses a NoOpTracerProvider — no network calls.	`None`
`agent_id`	`str`	Default agent_id attached to every span.	`'unknown'`

`llm_call(*, model, input_tokens=0, output_tokens=0, cost_usd=0.0, **extra_attrs)` ¶

Span for one LLM completion call.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model identifier string.	required
`input_tokens`	`int`	Number of prompt tokens consumed.	`0`
`output_tokens`	`int`	Number of completion tokens produced.	`0`
`cost_usd`	`float`	Estimated cost in USD.	`0.0`
`**extra_attrs`	`Any`	Additional span attributes.	`{}`

Yields:

Type	Description
`Span`	The active OTEL Span.

`tool_call(*, tool_name, success=True, duration_ms=0.0, **extra_attrs)` ¶

Span for one tool execution.

Parameters:

Name	Type	Description	Default
`tool_name`	`str`	Registered name of the tool.	required
`success`	`bool`	Whether the tool call succeeded.	`True`
`duration_ms`	`float`	Wall-clock execution time.	`0.0`
`**extra_attrs`	`Any`	Additional span attributes.	`{}`

Yields:

Type	Description
`Span`	The active OTEL Span.

`memory_read(*, memory_type, records_returned=0)` ¶

Span for a memory recall operation.

Parameters:

Name	Type	Description	Default
`memory_type`	`str`	One of "working", "episodic", "semantic", "procedural".	required
`records_returned`	`int`	Number of records surfaced by the query.	`0`

Yields:

Type	Description
`Span`	The active OTEL Span.

`memory_write(*, memory_type, content_length=0)` ¶

Span for a memory store operation.

Parameters:

Name	Type	Description	Default
`memory_type`	`str`	Memory layer being written.	required
`content_length`	`int`	Byte length of the content stored.	`0`

Yields:

Type	Description
`Span`	The active OTEL Span.

`record_llm_call(span, *, model, input_tokens=0, output_tokens=0, cost_usd=0.0, latency_ms=0.0, **extra_attrs)` ¶

Record LLM call attributes on an existing span.

Parameters:

Name	Type	Description	Default
`span`	`Span`	The span to annotate.	required
`model`	`str`	Model identifier string.	required
`input_tokens`	`int`	Number of prompt tokens consumed.	`0`
`output_tokens`	`int`	Number of completion tokens produced.	`0`
`cost_usd`	`float`	Estimated cost in USD.	`0.0`
`latency_ms`	`float`	Call latency in milliseconds.	`0.0`
`**extra_attrs`	`Any`	Additional span attributes.	`{}`

Span context manager¶

tracer = GrampusTracer(service_name="my-agent", otel_endpoint="http://localhost:4317")

with tracer.span("agent.custom_step", attributes={"step.name": "validate"}):
    do_work()

# Async
async with tracer.async_span("agent.llm_call", attributes={"model": "claude-sonnet-4-6"}):
    response = await llm.complete(messages)

Span types and attributes¶

Span type	Key attributes
`agent.run`	`agent.name`, `agent.model`, `session.id`, `agent.status`
`agent.llm_call`	`model`, `input_tokens`, `output_tokens`, `cost_usd`, `stop_reason`
`agent.tool_call`	`tool.name`, `tool.duration_ms`, `tool.success`, `tool.call_id`
`agent.memory_read`	`memory.type`, `memory.query`, `memory.results_count`
`agent.memory_write`	`memory.type`, `memory.source_type`, `memory.trust_level`
`agent.decision`	`agent.step`, `decision.action`

GrampusMetrics¶

Prometheus-compatible metrics endpoint.

`grampus.observability.metrics.GrampusMetrics` ¶

In-process metrics collector with Prometheus-compatible text exposition.

Does NOT require a running Prometheus server — stores everything in memory and exports to Prometheus text format on demand.

Parameters:

Name	Type	Description	Default
`agent_id`	`str`	Scopes per-agent metrics.	required

`record_llm_call(*, model, input_tokens, output_tokens, cost_usd, latency_ms)` ¶

Increment token/cost/call counters. Record latency in histogram.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model identifier.	required
`input_tokens`	`int`	Prompt token count.	required
`output_tokens`	`int`	Completion token count.	required
`cost_usd`	`float`	Estimated USD cost.	required
`latency_ms`	`float`	Round-trip latency in milliseconds.	required

`record_tool_call(*, tool_name, success, latency_ms)` ¶

Increment tool call counter. Record latency in histogram.

Parameters:

Name	Type	Description	Default
`tool_name`	`str`	Name of the invoked tool.	required
`success`	`bool`	Whether execution succeeded.	required
`latency_ms`	`float`	Execution time in milliseconds.	required

`record_error(*, error_type)` ¶

Increment error counter.

Parameters:

Name	Type	Description	Default
`error_type`	`str`	Short class name of the error.	required

`set_active_agents(count)` ¶

Update active agent gauge.

Parameters:

Name	Type	Description	Default
`count`	`int`	Current number of concurrently running agents.	required

`to_prometheus_text()` ¶

Export metrics in Prometheus text exposition format.

Returns:

Type	Description
`str`	Multiline string with # HELP, # TYPE, and metric lines.

`snapshot()` ¶

Return current accumulated metrics. Pure computation, no I/O.

Counter metrics¶

Metric name	Labels	Description
`grampus_total_tokens`	`model`, `agent_name`	Tokens consumed
`grampus_total_cost_usd`	`model`, `agent_name`	USD spent
`grampus_total_tool_calls`	`tool_name`, `agent_name`	Tool executions
`grampus_total_errors`	`error_code`, `agent_name`	Errors by type
`grampus_llm_call_count`	`model`, `agent_name`	Total LLM calls made

Gauge metrics¶

Metric name	Labels	Description
`grampus_active_agents`	`agent_name`	Currently running agents

Histogram metrics¶

Metric name	Labels	Description
`grampus_llm_latency_ms`	`model`, `agent_name`	LLM call latency in milliseconds
`grampus_tool_latency_ms`	`tool_name`, `agent_name`	Tool execution latency in milliseconds

EventLog¶

Append-only audit log for every agent action.

`grampus.observability.events.EventLog` ¶

Append-only, replayable log of agent events.

Backed by Dapr state store when configured; falls back to in-memory list when state_store is None (useful for testing).

Events are immutable once written. No update or delete operations.

Parameters:

Name	Type	Description	Default
`agent_id`	`str`	Scopes the log to this agent.	required
`session_id`	`str`	Current session.	required
`state_store`	`Any \| None`	Optional Dapr state store for persistence.	`None`

`append(event_type, payload=None)` `async` ¶

Create and store an AgentEvent. Returns the stored event.

Parameters:

Name	Type	Description	Default
`event_type`	`EventType`	The type of agent action being recorded.	required
`payload`	`dict[str, Any] \| None`	Arbitrary metadata about the event.	`None`

Returns:

Type	Description
`AgentEvent`	The persisted AgentEvent with an auto-assigned sequence number.

`replay()` `async` ¶

Return all events for this agent/session in sequence order.

Returns:

Type	Description
`list[AgentEvent]`	Ordered list of AgentEvent records from sequence 0 onward.

`replay_since(sequence_number)` `async` ¶

Return events with sequence_number >= the given value.

Parameters:

Name	Type	Description	Default
`sequence_number`	`int`	Inclusive lower bound on sequence number.	required

Returns:

Type	Description
`list[AgentEvent]`	Filtered, ordered list of AgentEvent records.

`event_count()` ¶

Return the number of events appended in this instance's lifetime.

AgentEvent¶

@dataclass
class AgentEvent:
    event_id: str
    session_id: str
    agent_name: str
    event_type: str          # see event types table below
    summary: str             # human-readable one-line description
    payload: dict[str, Any]  # full event data
    timestamp: datetime
    step: int                # ReAct iteration number

Event types¶

Event type	Payload keys
`agent.started`	`agent_name`, `model`, `input`
`agent.completed`	`steps_taken`, `cost_usd`, `output_preview`
`agent.failed`	`error_code`, `error_message`
`llm.called`	`model`, `message_count`, `input_tokens`
`llm.responded`	`output_tokens`, `cost_usd`, `stop_reason`
`tool.called`	`tool_name`, `arguments`
`tool.completed`	`duration_ms`, `output_preview`
`tool.failed`	`error_code`, `error_message`
`memory.read`	`query`, `types`, `results_count`
`memory.written`	`memory_type`, `source_type`, `trust_level`
`safety.violation`	`violation_type`, `severity`, `blocked`

BehaviorMonitor¶

Tracks agent behavior patterns and detects anomalies.

`grampus.observability.behavior.BehaviorMonitor` ¶

Tracks per-agent behavioral patterns and detects anomalies.

Maintains a rolling window of turn-level observations. After each turn is recorded, checks for anomalies against the baseline.

Parameters:

Name	Type	Description	Default
`agent_id`	`str`	Agent being monitored.	required
`cost_spike_threshold`	`float`	Multiplier above avg_cost triggering COST_SPIKE.	`3.0`
`error_spike_threshold`	`float`	Multiplier above avg_errors for ERROR_RATE_SPIKE.	`5.0`
`tool_shift_threshold`	`float`	Fraction of new tools triggering TOOL_USAGE_SHIFT.	`0.5`

`record_turn(*, cost_usd, tool_names, error_count)` ¶

Record one agent turn and return any anomalies detected.

Parameters:

Name	Type	Description	Default
`cost_usd`	`float`	Total cost incurred this turn.	required
`tool_names`	`list[str]`	Tools invoked during this turn.	required
`error_count`	`int`	Number of errors encountered.	required

Returns:

Type	Description
`list[Anomaly]`	List of new Anomaly objects (empty if baseline not yet established).

`anomalies()` ¶

Return the most recent anomalies (up to 1 000) detected so far.

`profile()` ¶

Return the current behavioral profile (snapshot).

Returns:

Type	Description
`AgentBehaviorProfile`	AgentBehaviorProfile with rolling-window statistics.

BehaviorAnomaly¶

@dataclass
class BehaviorAnomaly:
    pattern: str              # "tool_usage_shift" | "cost_spike" | etc.
    severity: str             # "warning" | "critical"
    description: str          # human-readable explanation
    current_value: float      # observed metric value
    baseline_value: float     # expected (rolling average) value
    ratio: float              # current / baseline

Monitored anomaly patterns¶

Pattern	Trigger condition
`tool_usage_shift`	Tool X called > 2.5× or < 0.4× baseline frequency
`cost_spike`	Cost per run > 2.5× rolling average
`memory_access_anomaly`	Memory reads from unusual source types
`error_rate_spike`	Error rate > 2.5× baseline
`latency_spike`	P95 run duration > 2.5× baseline

Observability API Reference¶

GrampusTracer¶

grampus.observability.tracer.GrampusTracer ¶

llm_call(*, model, input_tokens=0, output_tokens=0, cost_usd=0.0, **extra_attrs) ¶

tool_call(*, tool_name, success=True, duration_ms=0.0, **extra_attrs) ¶

memory_read(*, memory_type, records_returned=0) ¶

memory_write(*, memory_type, content_length=0) ¶

record_llm_call(span, *, model, input_tokens=0, output_tokens=0, cost_usd=0.0, latency_ms=0.0, **extra_attrs) ¶

Span context manager¶

Span types and attributes¶

GrampusMetrics¶

grampus.observability.metrics.GrampusMetrics ¶

record_llm_call(*, model, input_tokens, output_tokens, cost_usd, latency_ms) ¶

record_tool_call(*, tool_name, success, latency_ms) ¶

record_error(*, error_type) ¶

set_active_agents(count) ¶

to_prometheus_text() ¶

snapshot() ¶

Counter metrics¶

Gauge metrics¶

Histogram metrics¶

EventLog¶

grampus.observability.events.EventLog ¶

append(event_type, payload=None) async ¶

replay() async ¶

replay_since(sequence_number) async ¶

event_count() ¶

AgentEvent¶

Event types¶

BehaviorMonitor¶

grampus.observability.behavior.BehaviorMonitor ¶

record_turn(*, cost_usd, tool_names, error_count) ¶

anomalies() ¶

profile() ¶

BehaviorAnomaly¶

Monitored anomaly patterns¶

`grampus.observability.tracer.GrampusTracer` ¶

`llm_call(*, model, input_tokens=0, output_tokens=0, cost_usd=0.0, **extra_attrs)` ¶

`tool_call(*, tool_name, success=True, duration_ms=0.0, **extra_attrs)` ¶

`memory_read(*, memory_type, records_returned=0)` ¶

`memory_write(*, memory_type, content_length=0)` ¶

`record_llm_call(span, *, model, input_tokens=0, output_tokens=0, cost_usd=0.0, latency_ms=0.0, **extra_attrs)` ¶

`grampus.observability.metrics.GrampusMetrics` ¶

`record_llm_call(*, model, input_tokens, output_tokens, cost_usd, latency_ms)` ¶

`record_tool_call(*, tool_name, success, latency_ms)` ¶

`record_error(*, error_type)` ¶

`set_active_agents(count)` ¶

`to_prometheus_text()` ¶

`snapshot()` ¶

`grampus.observability.events.EventLog` ¶

`append(event_type, payload=None)` `async` ¶

`replay()` `async` ¶

`replay_since(sequence_number)` `async` ¶

`event_count()` ¶

`grampus.observability.behavior.BehaviorMonitor` ¶

`record_turn(*, cost_usd, tool_names, error_count)` ¶

`anomalies()` ¶

`profile()` ¶