wylon

Inference observability

Every inference call on wylon is measured end-to-end. Pre-built dashboards give you traffic, latency, error, and capacity telemetry out of the box, and a Prometheus-compatible metrics API lets you ship everything into your own stack.

What you get

Available metrics

CategoryMetricDescription
Trafficwylon_requests_per_minuteTotal requests per minute.
wylon_input_tokens_per_minuteInput (prompt) tokens per minute.
wylon_output_tokens_per_minuteGenerated output tokens per minute.
Latencywylon_request_duration_secondsEnd-to-end time from request sent to full response received.
wylon_time_to_first_token_secondsTTFT — first streamed token latency.
wylon_output_tokens_per_secondOutput speed after first token.
Capacitywylon_batch_jobs_in_progressNumber of Batch jobs currently executing.
wylon_queue_depthPending requests waiting for a GPU slot.
Errorswylon_error_ratePercentage of failed requests, grouped by HTTP status (4xx, 429, 5xx).
wylon_success_ratePercentage of 2xx responses.

Filters and dimensions

Every metric can be sliced by any combination of the labels below.

Metrics API

The Prometheus-format endpoint returns all metrics with their live values.

curl https://api.wylon.cn/v1/metrics \
  -H "Authorization: Bearer $WYLON_API_KEY"
# p99 TTFT for kimi-k2.5, last 15 minutes
histogram_quantile(0.99,
  sum(rate(wylon_time_to_first_token_seconds_bucket{model="moonshotai/kimi-k2.5"}[15m])) by (le)
)

# error rate by status class, last 1 hour
sum(rate(wylon_error_rate[1h])) by (status_class)

Exporters

TargetHow
PrometheusScrape /v1/metrics with bearer-token auth.
Grafana®Point a Prometheus data source at the same URL; a starter dashboard JSON is published in the cookbook.
OpenTelemetryPush spans + metrics to any OTLP-compatible sink via the collector sidecar.
Datadog / New RelicUse their OTLP intake endpoints with the OTel collector.

Request logs

Structured JSON logs for each request are retained for 30 days. Payload capture (prompts and completions) is off by default. Enable it per-project only where required — it affects billing and has clear privacy implications.

shield
PII in prompts. Treat payload capture as you would any log with potential personal data. See Privacy policy and Data processing for retention and regional controls.

Access control

Metric and log visibility follows your organization role:

FAQ

How fresh are the metrics?
Near-real-time — typically < 30 seconds from request completion to dashboard visibility.

Is there a cost to scrape /v1/metrics?
No. Metrics are included with every plan. High-volume log shipping may incur egress costs.

Can I correlate logs with client-side traces?
Yes — include a traceparent header (W3C Trace Context) and it will be propagated through wylon’s spans.

沪ICP备2026010432号-1 沪公网安备31010402336632号