XinoAPI Documentation
One unified OpenAI-compatible API for DeepSeek, Qwen, GLM, Kimi, and MiniMax. No KYC required. Get started in 2 minutes.
Quick start
The XinoAPI endpoint is a drop-in replacement for the OpenAI SDK. Change two lines and you're using Chinese LLMs.
1. Get your API key
Sign up at api.xinoapi.com/register. You'll get $2.00 in free credits and can create API keys from the dashboard. Keys start with xino-.
2. Install the OpenAI SDK
pip install openai
npm install openai
go get github.com/sashabaranov/go-openai
3. Send your first request
from openai import OpenAI client = OpenAI( api_key="xino-your-key-here", base_url="https://api.xinoapi.com/v1", ) response = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "user", "content": "Hello!"} ], ) print(response.choices[0].message.content)
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "xino-your-key-here", baseURL: "https://api.xinoapi.com/v1", }); const response = await client.chat.completions.create({ model: "deepseek-chat", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content);
package main import ( "context" "fmt" openai "github.com/sashabaranov/go-openai" ) func main() { config := openai.DefaultConfig("xino-your-key-here") config.BaseURL = "https://api.xinoapi.com/v1" client := openai.NewClientWithConfig(config) resp, _ := client.CreateChatCompletion( context.Background(), openai.ChatCompletionRequest{ Model: "deepseek-chat", Messages: []openai.ChatCompletionMessage{ {Role: "user", Content: "Hello!"}, }, }, ) fmt.Println(resp.Choices[0].Message.Content) }
curl https://api.xinoapi.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer xino-your-key-here" \ -d '{ "model": "deepseek-v4-flash", "messages": [{"role": "user", "content": "Hello!"}] }'
Your first response should come back in under 200ms. If you see an authentication error, double-check your API key in the dashboard.
Authentication
All requests require a bearer token in the Authorization header:
Authorization: Bearer xino-your-key-here
You can create and revoke keys at any time from api.xinoapi.com/token. Each key has its own usage quota and can be restricted to specific models.
Never commit API keys to git. Use environment variables: os.environ["XINOAPI_KEY"] in Python, process.env.XINOAPI_KEY in Node.js.
Available models
Pass the model ID as the model parameter. Full pricing at /pricing.
DeepSeek
Alibaba Qwen
Zhipu GLM
Moonshot Kimi
MiniMax
Chat completions
Standard OpenAI-compatible chat completion endpoint. All OpenAI SDK features work — system prompts, multi-turn conversations, temperature/top_p, max_tokens, stop sequences, JSON mode, function calling.
response = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum entanglement in 3 sentences."}, ], temperature=0.7, max_tokens=500, )
Streaming
Set stream=True to receive Server-Sent Events. The response streams token-by-token in the standard OpenAI format.
stream = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Write a haiku about code."}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
const stream = await client.chat.completions.create({ model: "deepseek-chat", messages: [{ role: "user", content: "Write a haiku." }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ""); }
Function calling
Most XinoAPI models support OpenAI-style function calling (tools parameter). DeepSeek, Qwen, and GLM have native support. Kimi and MiniMax have partial support — test your specific use case.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
},
"required": ["city"],
},
},
}]
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
)
tool_calls = response.choices[0].message.tool_calls
If you're building an agent that executes tool calls, use the Privacy SDK's response scanner to detect malicious tool calls injected by compromised intermediaries. See arXiv:2604.08407 for the threat model.
Privacy SDK
The XinoAPI Privacy SDK is a drop-in replacement for the OpenAI client that adds three defense layers: PII redaction, response threat scanning, and hash-chained audit logs. Free and open source (MIT).
pip install xinoapi-privacy
from xinoapi_privacy import PrivateClient client = PrivateClient( api_key="xino-your-key", base_url="https://api.xinoapi.com/v1", ) # PII is redacted before sending, restored in the response. # Response is scanned for threats. Audit log maintains hash chain. response = client.chat.completions.create( model="deepseek-chat", messages=[{ "role": "user", "content": "Email john@acme.com about Q3." }], )
Verifying response signatures
Every response from XinoAPI includes an X-SB-Signature header — an HMAC-SHA256 signature of the response body. Verify this to detect tampering between the gateway and your client.
Get your signing secret from /v1/security/signing-secret:
curl https://api.xinoapi.com/v1/security/signing-secret \
-H "Authorization: Bearer xino-your-key"
Then verify in your code:
from xinoapi_privacy.verifier import SignatureVerifier verifier = SignatureVerifier(signing_secret="your-secret") result = verifier.verify( body=response_bytes, timestamp=response.headers["X-SB-Timestamp"], signature=response.headers["X-SB-Signature"], ) if not result.valid: raise SecurityError(f"Tampered: {result.reason}")
Provider Data Policies
XinoAPI is a gateway to third-party model providers. We do not train on your prompts or responses, but upstream providers operate under their own terms, data policies, and regional obligations. Do not send sensitive data unless your organization has reviewed the selected provider's policy and applied client-side redaction where appropriate.
Error handling
XinoAPI returns standard HTTP status codes and OpenAI-compatible error bodies.
Rate limits
Default limits per API key:
- 60 requests/minute across all models
- 1 million tokens/minute across all models
- 10 concurrent streaming connections per key
Rate limit info is returned in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. For higher limits, email support@xinoapi.com.
Migration from other providers
From OpenAI
# Before client = OpenAI(api_key="sk-...") # After — model names change client = OpenAI( api_key="xino-...", base_url="https://api.xinoapi.com/v1", ) # gpt-5 → deepseek-chat (or another model) # gpt-5-mini → qwen-turbo or glm-4-flash
From OpenRouter
# Before client = OpenAI( api_key="sk-or-...", base_url="https://openrouter.ai/api/v1", ) # Uses "provider/model" format: "deepseek/deepseek-chat" # After client = OpenAI( api_key="xino-...", base_url="https://api.xinoapi.com/v1", ) # Uses bare model name: "deepseek-chat"
From direct DeepSeek
# Before — Chinese KYC required client = OpenAI( api_key="sk-...", base_url="https://api.deepseek.com", ) # After — no KYC, plus 4 other providers available client = OpenAI( api_key="xino-...", base_url="https://api.xinoapi.com/v1", ) # Model names unchanged: deepseek-chat, deepseek-reasoner
Frequently asked questions
How do I access DeepSeek from outside China?
Use XinoAPI as a proxy. It accepts international credit cards, requires no Chinese phone verification, and routes DeepSeek requests from Singapore with 180ms TTFT. Sign up at api.xinoapi.com/register, use base_url="https://api.xinoapi.com/v1" in the OpenAI SDK.
Is the XinoAPI endpoint OpenAI SDK compatible?
Yes. The /v1/chat/completions endpoint accepts the exact same request format as OpenAI and returns the same response shape. Function calling, streaming (SSE), JSON mode, and logit_bias all work identically.
What model name should I use for DeepSeek V4?
deepseek-v4-flash for the general-purpose V4 model (replaces V3.2 at the same price). deepseek-v4-pro for the flagship 1.6T-parameter model with Claude Opus-level performance. The legacy names deepseek-chat and deepseek-reasoner still work as aliases of V4-Flash but will be removed on 2026-07-24 — migrate to the explicit V4 IDs now.
Which models support function calling?
DeepSeek (all), Qwen (plus/turbo/max), GLM-4 (plus/flash), Kimi K2.5. MiniMax M2.7 has partial support. All use the standard OpenAI tools parameter format.
How do I stream responses?
Add stream=True (Python) or stream: true (Node.js) to your request. The response becomes an async iterator yielding OpenAI-format chunks. See the streaming section.
What's the maximum context window?
Varies by model: MiniMax M2.7 offers 245K, Kimi K2.5 offers 200K, DeepSeek/Qwen/GLM offer 128K. See available models for per-model limits.
Are there rate limits I should know about?
60 requests/minute and 1M tokens/minute per key by default. Higher limits available on request. Rate limit headers are returned with every response.
How do I handle errors?
XinoAPI uses standard HTTP status codes. For 402 (insufficient credits), prompt the user to top up. For 429 (rate limit), implement exponential backoff. For 502/504 (upstream errors), retry with a different model or wait. See the error reference.
Is my prompt data used to train models?
XinoAPI itself never trains on your data. Upstream providers have their own policies — DeepSeek, Qwen, and MiniMax offer opt-out; Kimi and GLM default to no training. Use the Privacy SDK to redact sensitive data before it reaches any upstream.
What's the SLA?
99.5% uptime target on all API endpoints. Upstream provider outages are not counted — the gateway routes around failed providers when possible. Real-time status at status.xinoapi.com.