Billing & consumption
wylon uses pay-as-you-go billing metered per token, with optional prepaid balance and invoiced contracts for larger teams. This page explains how usage is measured, how charges accrue, and how to control spend with budgets and alerts.
Pricing model
Every inference request is billed on two dimensions:
- Input tokens — everything you send: the prompt, system message, tool definitions, and prior turns.
- Output tokens — everything the model generates, including tool-call arguments and reasoning tokens.
Rates are per-million tokens and vary by model. Batch jobs receive a discount on top of the standard rate. See the pricing page for current rates.
usage field of each response are authoritative —
they are what you will be charged for. Cached prefix tokens (when eligible) are billed at a
reduced rate and appear as cached_input_tokens.
Prepaid balance
Most teams fund their organization by topping up a prepaid balance with a credit card. Each
request decrements the balance in real time. When the balance crosses zero, new API calls
return 402 Insufficient balance until you top up again.
| Action | Where | Notes |
|---|---|---|
| Top up | Dashboard → Billing → Add credit | Minimum ¥50, maximum ¥50,000 per transaction. |
| Auto-recharge | Dashboard → Billing → Auto-recharge | Set a trigger balance and recharge amount. |
| Payment method | Dashboard → Billing → Payment methods | Alipay, WeChat Pay, corporate transfer; enterprise users can switch to invoiced post-payment. |
Monthly invoicing
Teams spending over ¥10,000 / month can apply to switch to post-paid invoicing. We issue a VAT electronic invoice on the first business day of each month for the prior month’s usage, payable net-30 via corporate transfer. Submit a request via Contact us.
Usage dashboard
Dashboard → Usage breaks down consumption by model, project, API key, and day. Export any view as CSV for internal chargeback or BI pipelines.
# Query usage programmatically
curl https://api.wylon.cn/v1/usage?start=2026-04-01&end=2026-04-22 \
-H "Authorization: Bearer $WYLON_ADMIN_KEY"
import os, requests
r = requests.get(
"https://api.wylon.cn/v1/usage",
headers={"Authorization": f"Bearer {os.environ['WYLON_ADMIN_KEY']}"},
params={"start": "2026-04-01", "end": "2026-04-22", "group_by": "model"},
)
for row in r.json()["data"]:
print(row["model"], row["input_tokens"], row["output_tokens"], row["cost_cny"])
Budgets & spend caps
Set a hard monthly cap at the organization or project level. When usage reaches the cap,
subsequent requests are rejected with 429 Budget exceeded until the next billing
period or an admin raises the limit.
- Soft alerts — email + webhook when usage crosses 50 %, 80 %, and 100 % of budget.
- Hard cap — block further API calls at 100 %. Off by default, toggle per project.
- Per-key limits — set a daily spend ceiling on individual API keys for extra safety on untrusted clients.
Invoices & receipts
Every charge produces a PDF receipt downloadable from Dashboard → Billing → History. For tax-compliant invoices (VAT, Chinese fapiao, etc.), enter your company details under Billing → Tax information before the charge is issued.
Credits & trial
New organizations receive a free credit (valid for 60 days) once identity verification is complete. Credits are consumed before paid balance and are non-refundable. Research and open-source programs can apply for extended grants via Contact us.
Refunds
Prepaid balance is refundable within 30 days of top-up if unused. Consumed tokens are non-refundable except in the case of a platform incident acknowledged on our status page, in which case service credits are issued proportional to the impacted volume.
FAQ
- Are failed requests billed? No — 4xx and 5xx responses are not charged. Timed-out streams are billed for tokens actually produced.
- Are batch jobs billed differently? Yes — batch jobs are still billed per token, but at a discount versus the real-time rate. See Batch.
- Can I get a PO? Yes, on invoiced contracts — contact sales.