Observability API Reference¶
GrampusTracer¶
Wraps the OpenTelemetry SDK with agent-specific span types.
grampus.observability.tracer.GrampusTracer
¶
Wraps the OpenTelemetry SDK to produce Grampus-specific agent spans.
All span methods are synchronous context managers:
with tracer.agent_run(session_id="x") as span:
span.set_attribute("custom", "value")
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service_name
|
str
|
OTEL service name (e.g. "grampus-agent"). |
'grampus-agent'
|
otlp_endpoint
|
str | None
|
Optional OTLP exporter endpoint (e.g. "http://localhost:4317"). When None, uses a NoOpTracerProvider — no network calls. |
None
|
agent_id
|
str
|
Default agent_id attached to every span. |
'unknown'
|
llm_call(*, model, input_tokens=0, output_tokens=0, cost_usd=0.0, **extra_attrs)
¶
Span for one LLM completion call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier string. |
required |
input_tokens
|
int
|
Number of prompt tokens consumed. |
0
|
output_tokens
|
int
|
Number of completion tokens produced. |
0
|
cost_usd
|
float
|
Estimated cost in USD. |
0.0
|
**extra_attrs
|
Any
|
Additional span attributes. |
{}
|
Yields:
| Type | Description |
|---|---|
Span
|
The active OTEL Span. |
tool_call(*, tool_name, success=True, duration_ms=0.0, **extra_attrs)
¶
Span for one tool execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
Registered name of the tool. |
required |
success
|
bool
|
Whether the tool call succeeded. |
True
|
duration_ms
|
float
|
Wall-clock execution time. |
0.0
|
**extra_attrs
|
Any
|
Additional span attributes. |
{}
|
Yields:
| Type | Description |
|---|---|
Span
|
The active OTEL Span. |
memory_read(*, memory_type, records_returned=0)
¶
Span for a memory recall operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
memory_type
|
str
|
One of "working", "episodic", "semantic", "procedural". |
required |
records_returned
|
int
|
Number of records surfaced by the query. |
0
|
Yields:
| Type | Description |
|---|---|
Span
|
The active OTEL Span. |
memory_write(*, memory_type, content_length=0)
¶
Span for a memory store operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
memory_type
|
str
|
Memory layer being written. |
required |
content_length
|
int
|
Byte length of the content stored. |
0
|
Yields:
| Type | Description |
|---|---|
Span
|
The active OTEL Span. |
record_llm_call(span, *, model, input_tokens=0, output_tokens=0, cost_usd=0.0, latency_ms=0.0, **extra_attrs)
¶
Record LLM call attributes on an existing span.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to annotate. |
required |
model
|
str
|
Model identifier string. |
required |
input_tokens
|
int
|
Number of prompt tokens consumed. |
0
|
output_tokens
|
int
|
Number of completion tokens produced. |
0
|
cost_usd
|
float
|
Estimated cost in USD. |
0.0
|
latency_ms
|
float
|
Call latency in milliseconds. |
0.0
|
**extra_attrs
|
Any
|
Additional span attributes. |
{}
|
Span context manager¶
tracer = GrampusTracer(service_name="my-agent", otel_endpoint="http://localhost:4317")
with tracer.span("agent.custom_step", attributes={"step.name": "validate"}):
do_work()
# Async
async with tracer.async_span("agent.llm_call", attributes={"model": "claude-sonnet-4-6"}):
response = await llm.complete(messages)
Span types and attributes¶
| Span type | Key attributes |
|---|---|
agent.run |
agent.name, agent.model, session.id, agent.status |
agent.llm_call |
model, input_tokens, output_tokens, cost_usd, stop_reason |
agent.tool_call |
tool.name, tool.duration_ms, tool.success, tool.call_id |
agent.memory_read |
memory.type, memory.query, memory.results_count |
agent.memory_write |
memory.type, memory.source_type, memory.trust_level |
agent.decision |
agent.step, decision.action |
GrampusMetrics¶
Prometheus-compatible metrics endpoint.
grampus.observability.metrics.GrampusMetrics
¶
In-process metrics collector with Prometheus-compatible text exposition.
Does NOT require a running Prometheus server — stores everything in memory and exports to Prometheus text format on demand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_id
|
str
|
Scopes per-agent metrics. |
required |
record_llm_call(*, model, input_tokens, output_tokens, cost_usd, latency_ms)
¶
Increment token/cost/call counters. Record latency in histogram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier. |
required |
input_tokens
|
int
|
Prompt token count. |
required |
output_tokens
|
int
|
Completion token count. |
required |
cost_usd
|
float
|
Estimated USD cost. |
required |
latency_ms
|
float
|
Round-trip latency in milliseconds. |
required |
record_tool_call(*, tool_name, success, latency_ms)
¶
Increment tool call counter. Record latency in histogram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
Name of the invoked tool. |
required |
success
|
bool
|
Whether execution succeeded. |
required |
latency_ms
|
float
|
Execution time in milliseconds. |
required |
record_error(*, error_type)
¶
Increment error counter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
error_type
|
str
|
Short class name of the error. |
required |
set_active_agents(count)
¶
Update active agent gauge.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
Current number of concurrently running agents. |
required |
to_prometheus_text()
¶
Export metrics in Prometheus text exposition format.
Returns:
| Type | Description |
|---|---|
str
|
Multiline string with # HELP, # TYPE, and metric lines. |
snapshot()
¶
Return current accumulated metrics. Pure computation, no I/O.
Counter metrics¶
| Metric name | Labels | Description |
|---|---|---|
grampus_total_tokens |
model, agent_name |
Tokens consumed |
grampus_total_cost_usd |
model, agent_name |
USD spent |
grampus_total_tool_calls |
tool_name, agent_name |
Tool executions |
grampus_total_errors |
error_code, agent_name |
Errors by type |
grampus_llm_call_count |
model, agent_name |
Total LLM calls made |
Gauge metrics¶
| Metric name | Labels | Description |
|---|---|---|
grampus_active_agents |
agent_name |
Currently running agents |
Histogram metrics¶
| Metric name | Labels | Description |
|---|---|---|
grampus_llm_latency_ms |
model, agent_name |
LLM call latency in milliseconds |
grampus_tool_latency_ms |
tool_name, agent_name |
Tool execution latency in milliseconds |
EventLog¶
Append-only audit log for every agent action.
grampus.observability.events.EventLog
¶
Append-only, replayable log of agent events.
Backed by Dapr state store when configured; falls back to in-memory list when state_store is None (useful for testing).
Events are immutable once written. No update or delete operations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_id
|
str
|
Scopes the log to this agent. |
required |
session_id
|
str
|
Current session. |
required |
state_store
|
Any | None
|
Optional Dapr state store for persistence. |
None
|
append(event_type, payload=None)
async
¶
Create and store an AgentEvent. Returns the stored event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_type
|
EventType
|
The type of agent action being recorded. |
required |
payload
|
dict[str, Any] | None
|
Arbitrary metadata about the event. |
None
|
Returns:
| Type | Description |
|---|---|
AgentEvent
|
The persisted AgentEvent with an auto-assigned sequence number. |
replay()
async
¶
Return all events for this agent/session in sequence order.
Returns:
| Type | Description |
|---|---|
list[AgentEvent]
|
Ordered list of AgentEvent records from sequence 0 onward. |
replay_since(sequence_number)
async
¶
Return events with sequence_number >= the given value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_number
|
int
|
Inclusive lower bound on sequence number. |
required |
Returns:
| Type | Description |
|---|---|
list[AgentEvent]
|
Filtered, ordered list of AgentEvent records. |
event_count()
¶
Return the number of events appended in this instance's lifetime.
AgentEvent¶
@dataclass
class AgentEvent:
event_id: str
session_id: str
agent_name: str
event_type: str # see event types table below
summary: str # human-readable one-line description
payload: dict[str, Any] # full event data
timestamp: datetime
step: int # ReAct iteration number
Event types¶
| Event type | Payload keys |
|---|---|
agent.started |
agent_name, model, input |
agent.completed |
steps_taken, cost_usd, output_preview |
agent.failed |
error_code, error_message |
llm.called |
model, message_count, input_tokens |
llm.responded |
output_tokens, cost_usd, stop_reason |
tool.called |
tool_name, arguments |
tool.completed |
duration_ms, output_preview |
tool.failed |
error_code, error_message |
memory.read |
query, types, results_count |
memory.written |
memory_type, source_type, trust_level |
safety.violation |
violation_type, severity, blocked |
BehaviorMonitor¶
Tracks agent behavior patterns and detects anomalies.
grampus.observability.behavior.BehaviorMonitor
¶
Tracks per-agent behavioral patterns and detects anomalies.
Maintains a rolling window of turn-level observations. After each turn is recorded, checks for anomalies against the baseline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_id
|
str
|
Agent being monitored. |
required |
cost_spike_threshold
|
float
|
Multiplier above avg_cost triggering COST_SPIKE. |
3.0
|
error_spike_threshold
|
float
|
Multiplier above avg_errors for ERROR_RATE_SPIKE. |
5.0
|
tool_shift_threshold
|
float
|
Fraction of new tools triggering TOOL_USAGE_SHIFT. |
0.5
|
record_turn(*, cost_usd, tool_names, error_count)
¶
Record one agent turn and return any anomalies detected.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cost_usd
|
float
|
Total cost incurred this turn. |
required |
tool_names
|
list[str]
|
Tools invoked during this turn. |
required |
error_count
|
int
|
Number of errors encountered. |
required |
Returns:
| Type | Description |
|---|---|
list[Anomaly]
|
List of new Anomaly objects (empty if baseline not yet established). |
anomalies()
¶
Return the most recent anomalies (up to 1 000) detected so far.
profile()
¶
Return the current behavioral profile (snapshot).
Returns:
| Type | Description |
|---|---|
AgentBehaviorProfile
|
AgentBehaviorProfile with rolling-window statistics. |
BehaviorAnomaly¶
@dataclass
class BehaviorAnomaly:
pattern: str # "tool_usage_shift" | "cost_spike" | etc.
severity: str # "warning" | "critical"
description: str # human-readable explanation
current_value: float # observed metric value
baseline_value: float # expected (rolling average) value
ratio: float # current / baseline
Monitored anomaly patterns¶
| Pattern | Trigger condition |
|---|---|
tool_usage_shift |
Tool X called > 2.5× or < 0.4× baseline frequency |
cost_spike |
Cost per run > 2.5× rolling average |
memory_access_anomaly |
Memory reads from unusual source types |
error_rate_spike |
Error rate > 2.5× baseline |
latency_spike |
P95 run duration > 2.5× baseline |