Metrics APIs Reference¶

This reference documents the comprehensive metrics and performance monitoring APIs in SemanticKernel.Graph, which provide detailed insights into graph execution performance, resource usage, and operational health.

GraphPerformanceMetrics¶

Comprehensive performance metrics collector for graph execution. Tracks node-level metrics, execution paths, resource usage, and performance indicators.

Properties¶

TotalExecutions: Total number of graph executions tracked
Uptime: Time since metrics collection started
NodeMetrics: Dictionary of metrics per node
PathMetrics: Dictionary of metrics per execution path
CircuitBreakerMetrics: Dictionary of circuit breaker metrics per node
ResourceUsage: Current system resource usage (CPU, memory)
LastSampleTime: Timestamp of the last resource sample

Methods¶

StartNodeTracking¶

public NodeExecutionTracker StartNodeTracking(string nodeId, string nodeName, string executionId)

Starts tracking a node execution and returns a tracker for completion.

Parameters: * nodeId: Node identifier * nodeName: Node name * executionId: Execution identifier

Returns: Tracking token for completion

CompleteNodeTracking¶

public void CompleteNodeTracking(NodeExecutionTracker tracker, bool success, object? result = null, Exception? exception = null)

Completes node execution tracking and records metrics.

Parameters: * tracker: Node execution tracker * success: Whether execution was successful * result: Execution result (optional) * exception: Exception if failed (optional)

GetNodeMetrics¶

// The metrics collector exposes a NodeMetrics dictionary. Example usage:
if (metrics.NodeMetrics.TryGetValue(nodeId, out var nodeMetrics))
{
    // nodeMetrics is a NodeExecutionMetrics instance
}
else
{
    // Not found
}

Retrieves metrics for a specific node from the NodeMetrics property.

GetPerformanceSummary¶

public GraphPerformanceSummary GetPerformanceSummary(TimeSpan timeWindow)

Generates a comprehensive performance summary for the specified timeWindow with aggregated statistics.

Parameters: - timeWindow: Time window to analyze (e.g. TimeSpan.FromMinutes(5))

Returns: Performance summary with key metrics

Configuration¶

var options = new GraphMetricsOptions
{
    EnableResourceMonitoring = true,
    ResourceSamplingInterval = TimeSpan.FromSeconds(5),
    MaxSampleHistory = 10000,
    EnableDetailedPathTracking = true,
    EnablePercentileCalculations = true,
    MetricsRetentionPeriod = TimeSpan.FromHours(24)
};

var metrics = new GraphPerformanceMetrics(options);

NodeExecutionMetrics¶

Tracks execution metrics for a specific graph node. Provides detailed statistics about node performance and behavior.

Properties¶

NodeId: Node identifier
NodeName: Node name
TotalExecutions: Total number of executions
SuccessfulExecutions: Number of successful executions
FailedExecutions: Number of failed executions
SuccessRate: Success rate as a percentage (0-100)
AverageExecutionTime: Average execution duration
MinExecutionTime: Minimum execution duration
MaxExecutionTime: Maximum execution duration
FirstExecution: Timestamp of first execution
LastExecution: Timestamp of last execution

Methods¶

RecordExecution¶

public void RecordExecution(TimeSpan duration, bool success, object? result = null, Exception? exception = null)

Records a single execution with its outcome and timing.

Parameters: * duration: Execution duration * success: Whether execution succeeded * result: Execution result (optional) * exception: Exception if failed (optional)

GetPercentiles¶

// NodeExecutionMetrics provides a GetPercentile method for a single percentile.
// Example: get P95 execution time for a node
var p95 = nodeMetrics.GetPercentile(95);

Calculates a single execution time percentile (e.g. P50, P95, P99) for the node.

Parameters: - percentile: Percentile value (0-100)

Returns: Execution time at the requested percentile as a TimeSpan

OpenTelemetry Meter Integration¶

SemanticKernel.Graph integrates with OpenTelemetry's Meter for standardized metrics collection and export.

Meter Configuration¶

// Default meter names used by the framework
var streamingMeter = new Meter("SemanticKernel.Graph.Streaming", "1.0.0");
var distributionMeter = new Meter("SemanticKernel.Graph.Distribution", "1.0.0");
var agentPoolMeter = new Meter("skg.agent_pool", "1.0.0");

Metric Instruments¶

Counters¶

// Event counters with tags
var eventsCounter = meter.CreateCounter<long>("skg.stream.events", unit: "count", 
    description: "Total events emitted by the stream");

// Usage
eventsCounter.Add(1, new KeyValuePair<string, object?>("event_type", "NodeStarted"),
    new KeyValuePair<string, object?>("executionId", executionId),
    new KeyValuePair<string, object?>("graph", graphId),
    new KeyValuePair<string, object?>("node", nodeId));

Histograms¶

// Latency histograms
var eventLatencyMs = meter.CreateHistogram<double>("skg.stream.event.latency_ms", 
    unit: "ms", description: "Latency per event");

// Payload size histograms
var serializedPayloadBytes = meter.CreateHistogram<long>("skg.stream.event.payload_bytes", 
    unit: "bytes", description: "Serialized payload size per event");

// Usage
eventLatencyMs.Record(elapsedMs, new KeyValuePair<string, object?>("event_type", "NodeCompleted"),
    new KeyValuePair<string, object?>("executionId", executionId),
    new KeyValuePair<string, object?>("graph", graphId),
    new KeyValuePair<string, object?>("node", nodeId));

Standard Metric Tags¶

All metrics in SemanticKernel.Graph use consistent tagging for correlation and filtering:

Core Tags¶

executionId: Unique identifier for each graph execution run
graph: Stable identifier for the graph definition
node: Stable identifier for the specific node
event_type: Type of event or operation being measured

Additional Context Tags¶

workflow.id: Multi-agent workflow identifier
workflow.name: Human-readable workflow name
agent.id: Agent identifier in multi-agent scenarios
operation.type: Type of operation being performed
compressed: Whether data compression was applied
memory_mapped: Whether memory-mapped buffers were used

Metric Naming Convention¶

Metrics follow a hierarchical naming pattern:

skg.{component}.{metric_name}

Examples: * skg.stream.events - Event counter for streaming * skg.stream.event.latency_ms - Event latency histogram * skg.stream.event.payload_bytes - Event payload size histogram * skg.stream.producer.flush_ms - Producer flush latency * skg.agent_pool.connections - Agent pool connection counter * skg.work_distributor.tasks - Work distribution task counter

Streaming Metrics¶

Event Stream Metrics¶

var options = new StreamingExecutionOptions
{
    EnableMetrics = true,
    MetricsMeterName = "MyApp.GraphExecution"
};

// executor may be an IStreamingGraphExecutor (e.g. StreamingGraphExecutor)
var stream = executor.ExecuteStreamAsync(kernel, args, options);

Available Metrics: * skg.stream.events - Total events emitted (counter) * skg.stream.event.latency_ms - Event processing latency (histogram) * skg.stream.event.payload_bytes - Serialized payload size (histogram) * skg.stream.producer.flush_ms - Producer buffer flush latency (histogram)

Connection Pool Metrics¶

var poolOptions = new StreamingPoolOptions
{
    EnableMetrics = true,
    MetricsMeterName = "MyApp.StreamingPool"
};

Available Metrics: * skg.stream.pool.connections - Active connections (counter) * skg.stream.pool.requests - Request count (counter) * skg.stream.pool.latency_ms - Request latency (histogram)

Multi-Agent Metrics¶

Agent Pool Metrics¶

var agentOptions = new AgentConnectionPoolOptions
{
    EnableMetrics = true,
    MetricsMeterName = "skg.agent_pool"
};

Available Metrics: * skg.agent_pool.connections - Active agent connections (counter) * skg.agent_pool.requests - Request count (counter) * skg.agent_pool.latency_ms - Request latency (histogram)

Work Distribution Metrics¶

var distributorOptions = new WorkDistributorOptions
{
    EnableMetrics = true,
    MetricsMeterName = "skg.work_distributor"
};

Available Metrics: * skg.work_distributor.tasks - Task count (counter) * skg.work_distributor.latency_ms - Task distribution latency (histogram) * skg.work_distributor.queue_size - Queue size (gauge)

Performance Monitoring¶

Resource Monitoring¶

var metricsOptions = new GraphMetricsOptions
{
    EnableResourceMonitoring = true,
    ResourceSamplingInterval = TimeSpan.FromSeconds(5)
};

var metrics = new GraphPerformanceMetrics(metricsOptions);

Monitored Resources: * CPU usage percentage * Available memory (MB) * Process processor time * System load indicators

Execution Path Analysis¶

// GraphPerformanceMetrics exposes a PathMetrics dictionary. Example lookup:
if (metrics.PathMetrics.TryGetValue("path_signature", out var pathMetrics))
{
    var avgTime = pathMetrics.AverageExecutionTime;
    var successRate = pathMetrics.SuccessRate;
    var executionCount = pathMetrics.ExecutionCount;
}

Path Metrics: * Execution count per path * Success/failure rates * Average execution times * Path-specific performance trends

Metrics Export and Visualization¶

Export Formats¶

var exporter = new GraphMetricsExporter();
var jsonMetrics = exporter.ExportMetrics(metrics, MetricsExportFormat.Json, TimeSpan.FromHours(1));
var prometheusMetrics = exporter.ExportMetrics(metrics, MetricsExportFormat.Prometheus, TimeSpan.FromHours(1));

Dashboard Integration¶

var dashboard = new MetricsDashboard();
dashboard.RegisterExecution(executionContext, metrics);

var heatmap = dashboard.GeneratePerformanceHeatmap(metrics, visualizationData);
var summary = dashboard.ExportMetricsForVisualization(metrics, MetricsExportFormat.Json);

Configuration Examples¶

Development Environment¶

var devOptions = GraphMetricsOptions.CreateDevelopmentOptions();
// High-frequency sampling, detailed tracking, short retention

Production Environment¶

var prodOptions = GraphMetricsOptions.CreateProductionOptions();
// Balanced sampling, comprehensive tracking, extended retention

Performance-Critical Scenarios¶

var minimalOptions = GraphMetricsOptions.CreateMinimalOptions();
// Minimal overhead, basic metrics, short retention

Best Practices¶

Metric Tagging¶

Consistent Tags: Always use the standard tag set (executionId, graph, node)
Cardinality Management: Avoid high-cardinality tags that could explode metric series
Semantic Naming: Use descriptive tag values that aid in debugging and analysis

Performance Considerations¶

Sampling: Use appropriate sampling intervals for resource monitoring
Retention: Balance historical data needs with memory usage
Export Frequency: Configure export intervals based on monitoring requirements

Integration¶

OpenTelemetry: Leverage the built-in OpenTelemetry integration for standard observability
Custom Metrics: Extend with application-specific metrics using the same tagging patterns
Alerting: Use metric thresholds for proactive monitoring and alerting

Metrics APIs Reference¶

GraphPerformanceMetrics¶

Properties¶

Methods¶

StartNodeTracking¶

CompleteNodeTracking¶

GetNodeMetrics¶

GetPerformanceSummary¶

Configuration¶

NodeExecutionMetrics¶

Properties¶

Methods¶

RecordExecution¶

GetPercentiles¶

OpenTelemetry Meter Integration¶

Meter Configuration¶

Metric Instruments¶

Counters¶

Histograms¶

Standard Metric Tags¶

Core Tags¶

Additional Context Tags¶

Metric Naming Convention¶

Streaming Metrics¶

Event Stream Metrics¶

Connection Pool Metrics¶

Multi-Agent Metrics¶

Agent Pool Metrics¶

Work Distribution Metrics¶

Performance Monitoring¶

Resource Monitoring¶

Execution Path Analysis¶

Metrics Export and Visualization¶

Export Formats¶

Dashboard Integration¶

Configuration Examples¶

Development Environment¶

Production Environment¶

Performance-Critical Scenarios¶

Best Practices¶

Metric Tagging¶

Performance Considerations¶

Integration¶

See Also¶