POST https://api.wylon.cn/v1/chat/completions

对话请求（OpenAI）

基于一组消息生成模型回复。与 OpenAI 协议完全兼容；流式输出、函数调用与结构化输出通过请求参数控制。

授权

Authorization string · header 必填

Bearer 令牌。形如 Bearer wl-xxxxxxxx。在控制台创建 API 密钥。

请求体

Content-Type: application/json

model string 必填

调用的模型 ID，例如 moonshotai/Kimi-K2。可用模型列表见模型目录或调用列出模型接口。

messages array 必填

有序的对话轮次，每项为带 role 与 content 的对象。

role enum 必填

取值 system、user、assistant 或 tool。

content string · array 必填

文本内容；视觉语言模型支持 {type, text|image_url} 数组形式的多模内容块。

tool_call_id string 可选

仅当 role="tool" 时使用，关联到上一轮 assistant 发起的工具调用 ID。

stream boolean false

为 true 时以 SSE 流（text/event-stream）逐 Token 返回；末尾以 data: [DONE] 结束。

temperature number 可选

采样温度，范围 0 – 2。值越高回答越随机；建议与 top_p 二选一调节。

top_p number 可选

核采样阈值，范围 0 – 1。仅从累计概率达到 top_p 的最小候选集合中采样。

top_k integer 可选

每步从概率最高的 K 个候选中采样；0 表示不限制。

max_tokens integer 可选

本次生成的最大 Token 数，不可超过模型的上下文窗口。未设置时使用模型默认上限。

n integer 1

为同一组消息返回 N 个候选回复。注意计费按 n × completion_tokens 计算。

stop string · array 可选

最多 4 个停止序列，命中即截断输出。

presence_penalty number 0

范围 -2.0 – 2.0。正值减少重复主题，鼓励引入新话题。

frequency_penalty number 0

范围 -2.0 – 2.0。正值按词频惩罚，抑制逐字重复。

seed integer 可选

尽力而为的确定性采样种子，相同输入可复现相同输出。

tools array 可选

模型可调用的函数定义集合。每项为 {type: "function", function: {name, description, parameters}}。详见函数调用。

tool_choice string · object "auto"

控制工具调用："auto" 由模型决定；"none" 禁用；"required" 强制至少一次；或指定 {type:"function", function:{name:"..."}} 强制调用某个函数。

response_format object 可选

强制响应格式：{type:"json_object"} 或 {type:"json_schema", json_schema:{...}}。详见结构化输出。

logprobs boolean false

是否在响应中返回每个采样 Token 的对数概率。

top_logprobs integer 可选

当 logprobs 为 true 时返回每步前 N 个候选的概率，范围 0 – 20。

user string 可选

终端用户的稳定标识，用于滥用监控与组织级审计。

响应

非流式：返回 chat.completion 对象。
流式（stream=true）：以 SSE 形式逐块返回 chat.completion.chunk，末尾以 data: [DONE] 终止。

idstring

本次请求的唯一标识。

objectstring

固定为 chat.completion 或 chat.completion.chunk（流式）。

createdinteger

UNIX 时间戳（秒）。

modelstring

实际服务该请求的模型 ID。

choicesarray

候选回复列表。

indexinteger

候选序号，从 0 开始。

messageobject

非流式时返回完整 {role, content, tool_calls?}。

deltaobject

流式时返回的增量内容片段。

finish_reasonenum

stop / length / tool_calls / content_filter。

usageobject

Token 用量统计。

prompt_tokensinteger

输入 Token 数。

completion_tokensinteger

输出 Token 数。

total_tokensinteger

合计 Token 数。

cache_hit_rationumberwylon 扩展

本次请求命中系统级上下文缓存的比例（0 – 1）。重复前缀越多，比例越高、计费越优。

示例响应

{
  "id": "cmpl-9f1c7b2e8a41",
  "object": "chat.completion",
  "created": 1744828800,
  "model": "moonshotai/Kimi-K2",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "KV cache stores …" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 128,
    "total_tokens": 152,
    "cache_hit_ratio": 0.71
  }
}

参数错误、鉴权失败或触发速率限制。响应体为 OpenAI 兼容错误信封。

error.typestring

错误大类，如 invalid_request_error、authentication_error、rate_limit_exceeded。

error.messagestring

人类可读的错误说明。

error.codestring

细分错误码（可选），便于程序判断。

示例 — 429 限流

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Rate limit reached for moonshotai/Kimi-K2 on tier 2.",
    "retry_after": 3.4
  }
}

服务端瞬时故障。建议带抖动的指数退避重试。

示例 — 503

{
  "error": {
    "type": "server_overloaded",
    "message": "Upstream model is temporarily unavailable, please retry."
  }
}