Per-model pricing
All prices in USD per 1 million tokens. Input = your prompt, Output = model's response. Prices already include our 25% markup.
| Model | Input (per 1M) | Output (per 1M) | Context |
|---|---|---|---|
|
DeepSeek V4-Flash Popular
deepseek-v4-flash · 1M context, 384K output, dual reasoning modes
|
$0.14$0.175 | $0.28$0.35 | 1M |
|
DeepSeek V4-Pro Flagship
deepseek-v4-pro · 1.6T params, SWE-bench 80.6%, approaches Claude Opus
|
$0.4264$0.533 | $0.8528$1.066 | 1M |
|
DeepSeek Chat (legacy)
deepseek-chat · Alias of v4-flash non-thinking · deprecates 2026-07-24
|
$0.14$0.175 | $0.28$0.35 | 128K |
|
DeepSeek Reasoner (legacy)
deepseek-reasoner · Alias of v4-flash thinking · deprecates 2026-07-24
|
$0.4264$0.533 | $0.8528$1.066 | 128K |
|
Qwen Plus New
qwen3.6-plus · Alibaba flagship, 78.8% SWE-Bench, 1M context
|
$0.40$0.50 | $1.20$1.50 | 1M |
|
Qwen Max Flagship
qwen3.6-max-preview · #1 on 6 coding benchmarks, closed weights
|
$1.60$2.00 | $6.40$8.00 | 260K |
|
Qwen Turbo
qwen-turbo · Fast and cheap, good for high-volume
|
$0.05$0.0625 | $0.20$0.25 | 131K |
|
GLM-5.1 New
glm-5.1 · Zhipu flagship, #1 on SWE-Bench Pro
|
$1.00$1.25 | $4.00$5.00 | 200K |
|
GLM-5 New
glm-5 · 744B MoE, approaches Claude Opus-level coding
|
$1.00$1.25 | $4.00$5.00 | 200K |
|
GLM-4.7
glm-4.7 · 73.8% SWE-Bench, best value for coding
|
$0.15$0.1875 | $0.60$0.75 | 128K |
|
GLM-4.7 Flash Low cost
glm-4.7-flash · Free tier, good for simple completions
|
$0.05$0.0625 | $0.05$0.0625 | 203K |
|
Kimi K2.6 New
kimi-k2.6 · Long-context coding stability, 256K window
|
$0.60$0.75 | $3.00$3.75 | 256K |
|
Kimi K2.5
kimi-k2.5 · 1T MoE, 32B active, proven stable
|
$0.60$0.75 | $3.00$3.75 | 256K |
|
MiniMax M2.7
MiniMax-M2.7 · 230B MoE, strong on software engineering
|
$0.30$0.375 | $1.20$1.50 | 245K |
Strikethrough shows the direct provider price. XinoAPI price is what you pay. All prices in USD per 1 million tokens.
Real-world cost examples
What common workloads actually cost at XinoAPI prices.
Chat Assistant (500 users/day)
Code Assistant (10 devs)
Document Summary (bulk)
Reasoning Workflow
Free to start, no credit card
Every new account gets $2.00 in credits on signup. That's roughly 10 million tokens on DeepSeek V4-Flash — enough to build and ship a working prototype.
Production billing
No token-price discounting. Larger customers get operational guarantees, billing support, and dedicated routing instead of hidden price cuts.
For enterprise volumes (>$5K/month), invoicing, or dedicated routing, contact sales.
Data and compliance notes
Pricing is not the whole decision. XinoAPI is designed for users outside mainland China and routes requests to third-party model providers with different data policies.
| Control | Current policy | Why it matters |
|---|---|---|
| Mainland China access | Not permitted for registration, purchase, dashboard access, or API use. | Maintains a clear cross-border service boundary for Chinese LLM inference export. |
| Prompt/response storage | No plaintext content retention by default; billing uses metadata such as model, tokens, status, and timestamps. | Reduces data exposure for production agent and application workloads. |
| Provider terms | Users must comply with each upstream provider's terms, data policy, and regional restrictions. | XinoAPI is a gateway, not the developer or operator of upstream models. |
| Sensitive data | Use the Privacy SDK for local PII and secret redaction before sending prompts. | Provider-side policies vary, especially for models operated in mainland China. |
See the Compliance Center and Security Whitepaper for the full policy.
Pricing FAQ
Is XinoAPI cheaper than using DeepSeek or Qwen directly?
No — direct provider prices are lower by 25%. You pay a markup for unified billing, no-KYC access, sub-200ms latency from Singapore, and the included Privacy SDK. If you're based in China and only use one model, direct providers will be cheaper. If you're outside China, need multiple models, or want built-in security features, XinoAPI is usually the better economic choice despite the markup.
Are there any hidden fees?
No. You pay per token at the published rate. There's no API request fee separate from tokens, no inactivity fee, and credits never expire. The minimum Stripe top-up is $20 to keep card processing overhead sustainable.
How is this different from OpenRouter's pricing?
OpenRouter charges 0% token markup + 5.5% fee on credit purchases (minimum $0.80). XinoAPI charges 25% token markup + 0% purchase fee. At typical usage (~$50-100/month), OpenRouter ends up 10-15% cheaper on pure token cost. XinoAPI's value is in the security features, Chinese model specialization, and no-KYC requirement — not price competition.
What payment methods are accepted?
Credit and debit cards (Visa, Mastercard, AmEx) via Stripe. Bank transfers (ACH, wire) for topups above $500. Cryptocurrency payments coming soon. We do not accept Alipay or WeChat Pay — use the direct provider APIs if you need these payment methods.
Do credits expire?
No. Credits never expire on any plan. Unused credits remain in your account indefinitely.
Can I get a refund for unused credits?
Yes, within 30 days of purchase and provided no more than 10% of the credit has been consumed. Contact support@xinoapi.com with your account email and order ID.
How many tokens will $1 buy me?
Depends on the model. On DeepSeek V4-Flash input, $1 buys about 5.7 million tokens (roughly 4 million words). On DeepSeek V4-Pro output, $1 buys about 940,000 tokens. Use the cost calculator for exact estimates based on your expected request shape.
Do you charge for failed requests?
No. If an upstream provider returns a 5xx error or the request fails before the model generates output, you're not charged. Rate limit errors (429) and your own malformed requests (4xx) also don't consume credits.
Is the Privacy SDK free?
Yes. The XinoAPI Privacy SDK is MIT-licensed and free for any use, including with other LLM providers. Install from PyPI with pip install xinoapi-privacy.
Do you offer a free tier for students / open-source projects?
Yes. Open-source maintainers and students can apply for $20/month in free credits by emailing community@xinoapi.com with a link to your project or student ID.
What's the price of DeepSeek V4 on XinoAPI?
DeepSeek V4-Flash costs $0.175 per 1M input tokens and $0.35 per 1M output tokens. DeepSeek V4-Pro costs $0.533 input and $1.066 output per 1M tokens, based on Tencent Cloud's June 2026 V4-Pro price cut converted at Stripe's 7.0380 CNY/USD rate with a 25% XinoAPI markup. Both models support 1M context and 384K max output.
Which DeepSeek model should I use?
Use deepseek-v4-flash for general tasks (chat, RAG, code completion) — it's the successor to V3.2 at the same price. Use deepseek-v4-pro when you need flagship reasoning performance (SWE-bench 80.6%, approaches Claude Opus 4.6). Note that deepseek-chat and deepseek-reasoner are now aliases of V4-Flash and will be deprecated on 2026-07-24 — migrate to explicit V4 model IDs before then.
What's the cheapest model on XinoAPI?
GLM-4.7 Flash is free (limited quota). Among paid models, Qwen 3.5 Turbo at $0.0625 input / $0.25 output per 1M tokens is the cheapest. For quality-to-cost ratio, DeepSeek V4-Flash ($0.175 / $0.35) with 1M context is usually the best choice.
Start building with $2 free credits
No credit card required. 5 Chinese LLMs. Unified OpenAI-compatible API.