Skip to main content

What is AI Observability?

AI observability is the practice of monitoring, understanding, and debugging AI applications in production. For LLM-based applications, this means tracking:
  • What - The inputs and outputs of AI model calls
  • How - Performance metrics like latency and token usage
  • How much - Cost attribution and spending
  • Why - Error tracking and debugging information

OpenTelemetry

Anyway is built on OpenTelemetry (OTel), the industry-standard framework for observability. OTel provides:
  • Vendor-neutral instrumentation
  • Standard data formats
  • Wide ecosystem support
This means your telemetry data is portable and can be exported to other tools if needed.

Traces

A trace represents a single logical operation in your application, such as handling a user request. Traces help you understand the flow of requests through your system.
Trace: handle-user-message
├── validate-input (2ms)
├── retrieve-context (15ms)
├── openai.chat.completions (1,234ms)
├── format-response (3ms)
└── send-response (5ms)

Trace ID

Every trace has a unique ID that allows you to correlate all operations related to a single request:
Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736

Spans

A span represents a single operation within a trace. Spans have:
  • Name - What operation is being performed
  • Start time - When it began
  • Duration - How long it took
  • Attributes - Key-value metadata
  • Status - Success or error

Span Hierarchy

Spans can be nested to show parent-child relationships:
Trace: user-query
└── Span: process_query.workflow (parent)
    ├── Span: validate.task (child)
    ├── Span: llm_call.task (child)
    │   └── Span: openai.chat (grandchild - auto-instrumented)
    └── Span: format_response.task (child)

Span Attributes

Attributes provide context about each operation. For LLM spans, these follow the OpenTelemetry GenAI semantic conventions:
{
    "gen_ai.system": "openai",
    "gen_ai.request.model": "gpt-4o",
    "gen_ai.usage.prompt_tokens": 150,
    "gen_ai.usage.completion_tokens": 200,
    "gen_ai.usage.cost": 0.0035
}

Workflows and Tasks

The Anyway SDK uses workflows and tasks to create structured trace hierarchies:
from anyway.sdk.decorators import workflow, task

@task(name="summarize")
def summarize(text):
    # LLM call - creates a child span
    return result

@workflow(name="process_document")
def process_document(doc):
    # Creates a parent span containing all tasks
    return summarize(doc)

Metrics

Metrics are numerical measurements aggregated over time. Unlike traces (which capture individual requests), metrics summarize patterns:
  • Counter - Values that only increase (total requests, total tokens)
  • Gauge - Point-in-time values (queue size, active connections)
  • Histogram - Distribution of values (latency percentiles)

Semantic Conventions

Anyway follows the OpenTelemetry GenAI semantic conventions for standardized attribute names:
AttributeDescription
gen_ai.systemProvider (openai, anthropic)
gen_ai.request.modelRequested model
gen_ai.response.modelModel used in response
gen_ai.usage.prompt_tokensInput token count
gen_ai.usage.completion_tokensOutput token count
gen_ai.usage.costEstimated total cost (USD)

Next Steps

Quickstart

Get started with Anyway in 5 minutes.

Tracing

Deep dive into distributed tracing.