POST
https://api.wylon.cn/v1/chat/completions
Chat completions
Generate model responses from a list of messages. Fully OpenAI-compatible; streaming, function calling and structured output are controlled through request parameters.
Authorization
Authorization
string · header
required
Bearer token, e.g.
Bearer wl-xxxxxxxx. Create API keys in the dashboard.Request body
Content-Type: application/json
model
string
required
Model ID to call, e.g.
moonshotai/Kimi-K2. The available list is in the model catalog or via the list models endpoint.
messages
array
required
Ordered conversation turns; each item has a
role and content.
role
enum
required
One of
system, user, assistant or tool.
content
string · array
required
Text content; vision-language models also accept multimodal content blocks as an array of
{type, text|image_url}.
tool_call_id
string
optional
Only used when
role="tool"; correlates to the tool-call ID from the previous assistant turn.
stream
boolean
false
When
true, deltas are streamed as SSE (text/event-stream); the stream terminates with data: [DONE].
temperature
number
optional
Sampling temperature, range
0 – 2. Higher means more random; tune either this or top_p, not both.
top_p
number
optional
Nucleus-sampling threshold, range
0 – 1. Samples from the smallest set whose cumulative probability reaches top_p.
top_k
integer
optional
Sample from the top-K most probable tokens at each step;
0 disables.
max_tokens
integer
optional
Maximum tokens to generate; cannot exceed the model context window. If unset, the model's default cap is used.
n
integer
1
Number of completion candidates to return. Note: billing is
n × completion_tokens.
stop
string · array
optional
Up to 4 stop sequences; generation halts as soon as one is matched.
presence_penalty
number
0
Range
-2.0 – 2.0. Positive values reduce topic repetition and encourage novelty.
frequency_penalty
number
0
Range
-2.0 – 2.0. Positive values penalize tokens by frequency, suppressing verbatim repetition.
seed
integer
optional
Best-effort deterministic sampling seed; same input ⇒ same output.
tools
array
optional
Function definitions the model may call. Each is
{type: "function", function: {name, description, parameters}}. See Function calling.
tool_choice
string · object
"auto"
Control tool invocation:
"auto" lets the model decide; "none" disables tools; "required" forces at least one call; or pass {type:"function", function:{name:"..."}} to force a specific function.
response_format
object
optional
Force a response format:
{type:"json_object"} or {type:"json_schema", json_schema:{...}}. See Structured output.
logprobs
boolean
false
Whether to return per-token log probabilities.
top_logprobs
integer
optional
When
logprobs is true, return per-step top-N candidate probabilities, range 0 – 20.
user
string
optional
Stable end-user identifier used for abuse monitoring and organization-level audit.
Response
Non-streaming: returns a chat.completion object.
Streaming (stream=true): chunks of chat.completion.chunk are returned as SSE, terminated by data: [DONE].
idstring
Unique identifier for this request.
objectstring
Either
chat.completion or chat.completion.chunk (streaming).createdinteger
UNIX timestamp (seconds).
modelstring
The model ID that actually served this request.
choicesarray
List of completion candidates.
indexinteger
Index of the candidate, starting at 0.
messageobject
Returned in non-streaming mode: complete
{role, content, tool_calls?}.deltaobject
Incremental content chunk returned in streaming mode.
finish_reasonenum
stop / length / tool_calls / content_filter.usageobject
Token usage stats.
prompt_tokensinteger
Input token count.
completion_tokensinteger
Output token count.
total_tokensinteger
Total token count.
cache_hit_rationumberwylon extension
Share of input tokens served by the system-level context cache (0 – 1). The more repeated prefix, the higher the ratio and the lower the cost.
Example response
{
"id": "cmpl-9f1c7b2e8a41",
"object": "chat.completion",
"created": 1744828800,
"model": "moonshotai/Kimi-K2",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "KV cache stores …" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 128,
"total_tokens": 152,
"cache_hit_ratio": 0.71
}
}
Bad request, authentication failure, or an exceeded rate limit. Errors use the OpenAI-compatible envelope.
error.typestring
Error category, e.g.
invalid_request_error, authentication_error, rate_limit_exceeded.error.messagestring
Human-readable error description.
error.codestring
Optional fine-grained error code for programmatic handling.
Example — 429 rate-limited
{
"error": {
"type": "rate_limit_exceeded",
"message": "Rate limit reached for moonshotai/Kimi-K2 on tier 2.",
"retry_after": 3.4
}
}
Transient server error. Retry with exponential backoff and jitter.
Example — 503
{
"error": {
"type": "server_overloaded",
"message": "Upstream model is temporarily unavailable, please retry."
}
}