wylon
POST https://api.wylon.cn/v1/chat/completions

Chat completions

Generate model responses from a list of messages. Fully OpenAI-compatible; streaming, function calling and structured output are controlled through request parameters.

Authorization

Authorization string · header required
Bearer token, e.g. Bearer wl-xxxxxxxx. Create API keys in the dashboard.

Request body

Content-Type: application/json

model string required
Model ID to call, e.g. moonshotai/Kimi-K2. The available list is in the model catalog or via the list models endpoint.
messages array required
Ordered conversation turns; each item has a role and content.
role enum required
One of system, user, assistant or tool.
content string · array required
Text content; vision-language models also accept multimodal content blocks as an array of {type, text|image_url}.
tool_call_id string optional
Only used when role="tool"; correlates to the tool-call ID from the previous assistant turn.
stream boolean false
When true, deltas are streamed as SSE (text/event-stream); the stream terminates with data: [DONE].
temperature number optional
Sampling temperature, range 0 – 2. Higher means more random; tune either this or top_p, not both.
top_p number optional
Nucleus-sampling threshold, range 0 – 1. Samples from the smallest set whose cumulative probability reaches top_p.
top_k integer optional
Sample from the top-K most probable tokens at each step; 0 disables.
max_tokens integer optional
Maximum tokens to generate; cannot exceed the model context window. If unset, the model's default cap is used.
n integer 1
Number of completion candidates to return. Note: billing is n × completion_tokens.
stop string · array optional
Up to 4 stop sequences; generation halts as soon as one is matched.
presence_penalty number 0
Range -2.0 – 2.0. Positive values reduce topic repetition and encourage novelty.
frequency_penalty number 0
Range -2.0 – 2.0. Positive values penalize tokens by frequency, suppressing verbatim repetition.
seed integer optional
Best-effort deterministic sampling seed; same input ⇒ same output.
tools array optional
Function definitions the model may call. Each is {type: "function", function: {name, description, parameters}}. See Function calling.
tool_choice string · object "auto"
Control tool invocation: "auto" lets the model decide; "none" disables tools; "required" forces at least one call; or pass {type:"function", function:{name:"..."}} to force a specific function.
response_format object optional
Force a response format: {type:"json_object"} or {type:"json_schema", json_schema:{...}}. See Structured output.
logprobs boolean false
Whether to return per-token log probabilities.
top_logprobs integer optional
When logprobs is true, return per-step top-N candidate probabilities, range 0 – 20.
user string optional
Stable end-user identifier used for abuse monitoring and organization-level audit.

Response

Non-streaming: returns a chat.completion object.
Streaming (stream=true): chunks of chat.completion.chunk are returned as SSE, terminated by data: [DONE].

idstring
Unique identifier for this request.
objectstring
Either chat.completion or chat.completion.chunk (streaming).
createdinteger
UNIX timestamp (seconds).
modelstring
The model ID that actually served this request.
choicesarray
List of completion candidates.
indexinteger
Index of the candidate, starting at 0.
messageobject
Returned in non-streaming mode: complete {role, content, tool_calls?}.
deltaobject
Incremental content chunk returned in streaming mode.
finish_reasonenum
stop / length / tool_calls / content_filter.
usageobject
Token usage stats.
prompt_tokensinteger
Input token count.
completion_tokensinteger
Output token count.
total_tokensinteger
Total token count.
cache_hit_rationumberwylon extension
Share of input tokens served by the system-level context cache (0 – 1). The more repeated prefix, the higher the ratio and the lower the cost.
Example response
{
  "id": "cmpl-9f1c7b2e8a41",
  "object": "chat.completion",
  "created": 1744828800,
  "model": "moonshotai/Kimi-K2",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "KV cache stores …" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 128,
    "total_tokens": 152,
    "cache_hit_ratio": 0.71
  }
}

Bad request, authentication failure, or an exceeded rate limit. Errors use the OpenAI-compatible envelope.

error.typestring
Error category, e.g. invalid_request_error, authentication_error, rate_limit_exceeded.
error.messagestring
Human-readable error description.
error.codestring
Optional fine-grained error code for programmatic handling.
Example — 429 rate-limited
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit reached for moonshotai/Kimi-K2 on tier 2.",
    "retry_after": 3.4
  }
}

Transient server error. Retry with exponential backoff and jitter.

Example — 503
{
  "error": {
    "type": "server_overloaded",
    "message": "Upstream model is temporarily unavailable, please retry."
  }
}
沪ICP备2026010432号-1 沪公网安备31010402336632号