wylon

Quickstart

Token Factory is wylon's developer-facing inference service — a single API for the leading open-source LLMs (MiniMax, Kimi, GLM, Qwen, DeepSeek), running on a purpose-built GPU compute backbone. Drop in any OpenAI- or Anthropic-compatible client and you can ship your first request in minutes.

Overview

Browse the model catalog to pick a model, then call the OpenAI / Anthropic-compatible API from your application. Works seamlessly with common frameworks — LangChain, LlamaIndex, LiteLLM, the OpenAI SDK, and more.

On wylon you can:

Start building now

Follow the three steps below to send your first request to the wylon inference API.

  1. Create an account

    Sign up for a free wylon account in the dashboard. After completing identity verification you can use every available model; see Billing & consumption for plan details and free credit.

    info
    Already using OpenAI or Anthropic? wylon is drop-in compatible with both SDKs — you can migrate in minutes. See Switch to wylon.
  2. Generate an API key

    Navigate to Account Settings → API keys in the dashboard and click Create new key. Copy the generated API token. Detailed key management and scope rules live in API Keys.

    Export it to your shell environment so the code samples below can authenticate:

    # Add to ~/.bashrc or ~/.profile
    export WYLON_API_KEY="wl-••••••••••••••••••••••••••••••••"
    export WYLON_BASE_URL="https://api.wylon.cn/v1"
    # Add to ~/.zshrc
    export WYLON_API_KEY="wl-••••••••••••••••••••••••••••••••"
    export WYLON_BASE_URL="https://api.wylon.cn/v1"
    # PowerShell — persistent for the current user
    [Environment]::SetEnvironmentVariable("WYLON_API_KEY", "wl-••••••••••••••••••••••••••••••••", "User")
    [Environment]::SetEnvironmentVariable("WYLON_BASE_URL", "https://api.wylon.cn/v1", "User")
    key
    Keep your key secret. Never commit API keys to source control or ship them in client-side bundles. Use a secret manager or server-side proxy for production workloads.
  3. Send your first request

    Using the OpenAI-compatible interface as an example.
    Point any OpenAI-compliant client at https://api.wylon.cn/v1 and use a supported model ID. The example below calls Kimi K2 with a simple chat prompt.

    from openai import OpenAI
    import os
    
    client = OpenAI(
        api_key=os.environ["WYLON_API_KEY"],
        base_url="https://api.wylon.cn/v1",
    )
    
    response = client.chat.completions.create(
        model="moonshotai/kimi-k2.5",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain KV cache in one paragraph."},
        ],
        temperature=0.6,
        max_tokens=512,
    )
    
    print(response.choices[0].message.content)
    import OpenAI from "openai";
    
    const client = new OpenAI({
      apiKey: process.env.WYLON_API_KEY,
      baseURL: "https://api.wylon.cn/v1",
    });
    
    const response = await client.chat.completions.create({
      model: "moonshotai/kimi-k2.5",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user",   content: "Explain KV cache in one paragraph." },
      ],
      temperature: 0.6,
      max_tokens: 512,
    });
    
    console.log(response.choices[0].message.content);
    curl https://api.wylon.cn/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $WYLON_API_KEY" \
      -d '{
        "model": "moonshotai/kimi-k2.5",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user",   "content": "Explain KV cache in one paragraph."}
        ],
        "temperature": 0.6,
        "max_tokens": 512
      }'
    package main
    
    import (
        "context"
        "fmt"
        "os"
    
        "github.com/sashabaranov/go-openai"
    )
    
    func main() {
        cfg := openai.DefaultConfig(os.Getenv("WYLON_API_KEY"))
        cfg.BaseURL = "https://api.wylon.cn/v1"
        client := openai.NewClientWithConfig(cfg)
    
        resp, err := client.CreateChatCompletion(context.Background(),
            openai.ChatCompletionRequest{
                Model: "moonshotai/kimi-k2.5",
                Messages: []openai.ChatCompletionMessage{
                    {Role: "user", Content: "Explain KV cache in one paragraph."},
                },
            })
        if err != nil { fmt.Println(err); return }
        fmt.Println(resp.Choices[0].Message.Content)
    }

    A successful response returns the assistant’s completion, the tokens consumed, and the cache-hit ratio served from wylon’s system-level KV cache.

    {
      "id": "cmpl-9f1c7b2e8a41",
      "object": "chat.completion",
      "model": "moonshotai/kimi-k2.5",
      "created": 1744828800,
      "choices": [{
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "KV cache stores the key and value tensors …"
        },
        "finish_reason": "stop"
      }],
      "usage": {
        "prompt_tokens": 24,
        "completion_tokens": 128,
        "total_tokens": 152,
        "cache_hit_ratio": 0.71      // wylon extension: context-cache hit ratio
      }
    }

    The cache_hit_ratio in usage is a wylon extension to the OpenAI schema: it reports the share of input tokens served from the system-level context cache (the more repeated prefix, the higher the hit ratio and the lower the cost). All other fields match the OpenAI protocol.

API endpoints

The wylon inference API follows the OpenAI protocol. The table below lists the endpoints you’ll use most often.

Method & path Purpose Notes
POST  /chat/completions Conversational generation Supports streaming, function calling, structured output.
POST  /completions Legacy text completion For models without chat templates.
GET  /models List available models Returns the model IDs available to your account.
POST  /batches Submit a batch job Process large async workloads at discounted pricing. See Batch.

Common parameters

Every chat completion request accepts the parameters below. Defaults are tuned for balanced quality and latency.

Parameter Type Description
model string Model ID, e.g. moonshotai/kimi-k2.5. See all models.
messages array Ordered list of {role, content} turns. Roles: system, user, assistant, tool.
temperature number Sampling temperature between 0 and 2. Defaults to 0.7.
max_tokens integer Maximum tokens to generate. Bounded by the model’s context window.
stream boolean Return a server-sent event stream of token deltas.
tools array Function definitions the model may call. See Function calling.
response_format object Force JSON or a named schema. See Structured output.

Explore

You’re set up. Dive deeper into the capabilities you’ll need next.

Need help?

For support, reach out via Contact us; live service health is published on the status page.

沪ICP备2026010432号-1 沪公网安备31010402336632号