Skip to content

Metrics

The MCP Lifecycle Operator exposes Prometheus metrics from the controller manager (the operator Deployment that reconciles MCPServer resources). Metrics include those registered by controller-runtime (for example workqueues and the Kubernetes API client) and custom mcpserver_* series documented below.

This page is aimed at platform and cluster operators who scrape Prometheus and tune alerting—not at authors of MCPServer manifests alone.

Metrics endpoint

Metrics are exposed over HTTPS at path /metrics on port 8443 on the controller manager metrics Service.

After a typical install from the release install.yaml, scrape:

https://mcp-lifecycle-operator-controller-manager-metrics-service.mcp-lifecycle-operator-system.svc:8443/metrics

Adjust the Service name and namespace if you change the Kustomize namePrefix / namespace when deploying.

Tuning the metrics listen address (advanced)

If you installed from the release install.yaml, metrics are already available on :8443 with HTTPS—you can use the scrape URL above and skip this section.

If you customize the operator Deployment, check how the manager container sets its args. The repository’s sample patch shows what a typical install uses:

config/default/manager_metrics_patch.yaml

Those args correspond to the following flags:

Flag Default Description
--metrics-bind-address 0 (disabled) Address to serve metrics on. Set to :8443 for HTTPS or :8080 for HTTP. The sample patch above sets this to :8443.
--metrics-secure true Serve metrics over HTTPS. Set to false for plain HTTP.

Custom metrics

Custom metrics use the Prometheus namespace mcpserver (exported names start with mcpserver_).

Metric Type Description
mcpserver_condition_info gauge Current Accepted / Ready condition snapshot per MCPServer. Value is always 1; filter by labels.
mcpserver_validation_failures_total counter Total permanent configuration validation failures (ValidationError).
mcpserver_deployment_failures_total counter Total failures when reconciling the workload Deployment (reason is currently ReconcileError).
mcpserver_service_failures_total counter Total failures when reconciling the Service (reason is currently ReconcileError).
mcpserver_reconcile_phase_duration_seconds histogram Duration of reconciliation phases validation, deployment, and service (seconds; default Prometheus histogram buckets).

Labels for mcpserver_condition_info

Label Description
name MCPServer name
namespace MCPServer namespace
type Condition type: Accepted or Ready
status True, False, or Unknown
reason Condition reason (intended to mirror .status.conditions[]; see Gauge versus API status)

Only one active series exists per (name, namespace, type). On delete, both gauge series and *_failures_total counter series for that object are removed from the exporter.

Typical reasons

type Typical reason values status notes
Accepted Valid, Invalid Usually True or False
Ready Available, ConfigurationInvalid, DeploymentUnavailable, ServiceUnavailable, ScaledToZero, Initializing, MCPEndpointUnavailable May be Unknown (for example Initializing while the Deployment has not reported conditions yet)

Gauge versus API status

In rare cases the mcpserver_condition_info gauge can diverge from what you see in MCPServer.status.conditions. When investigating correctness, treat MCPServer.status as the source of truth.

  • Permanent validation errorReady / ConfigurationInvalid may appear in the API only after a successful status write, while the gauge updated earlier or on a different path.
  • MCP handshake — after Available, a failed handshake can set status to MCPEndpointUnavailable without a second gauge update in the same reconcile.

Example queries

sum by (namespace, type, status, reason) (mcpserver_condition_info)
sum by (reason) (mcpserver_condition_info{type="Ready", status="False"})
sum(rate(mcpserver_validation_failures_total[5m])) by (namespace)
histogram_quantile(
  0.99,
  sum(rate(mcpserver_reconcile_phase_duration_seconds_bucket[5m])) by (le, phase)
)

Labels for failure counters (mcpserver_*_failures_total)

mcpserver_validation_failures_total, mcpserver_deployment_failures_total, and mcpserver_service_failures_total share the same label set:

Label Description
name MCPServer name
namespace MCPServer namespace
reason Depends on which counter (see below)

reason values

  • mcpserver_validation_failures_total — permanent validation errors currently use Invalid.
  • mcpserver_deployment_failures_total and mcpserver_service_failures_total — currently ReconcileError when the corresponding reconcile step returns an error.

Labels for mcpserver_reconcile_phase_duration_seconds

Label Description
phase Reconciliation phase: validation, deployment, or service

Histogram time series use the usual _bucket, _sum, and _count suffixes for quantiles and averages.

Prometheus Operator

If you use the Prometheus Operator, apply a ServiceMonitor that selects the controller-manager metrics Service. Example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: controller-manager-metrics-monitor
  namespace: mcp-lifecycle-operator-system   # namespace where the operator runs
  labels:
    control-plane: controller-manager
    app.kubernetes.io/name: mcp-lifecycle-operator
spec:
  endpoints:
    - path: /metrics
      port: https
      scheme: https
      bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      tlsConfig:
        insecureSkipVerify: true   # tighten for production (e.g. cert-manager); see repo sample
  selector:
    matchLabels:
      control-plane: controller-manager
      app.kubernetes.io/name: mcp-lifecycle-operator

The repository maintains the full sample at config/prometheus/monitor.yaml. Wire it into your install by uncommenting the [PROMETHEUS] resource (../prometheus) in config/default/kustomization.yaml, or apply an equivalent manifest alongside kube-prometheus-stack. Add labels your Prometheus ServiceMonitor selector expects (for example release: prometheus).

Next steps

  • Introduction — Architecture and MCPServer overview (including status conditions)
  • Quickstart — Deploy an MCP server and inspect status
  • Contributing — How to contribute