XinoAPI Documentation

One unified OpenAI-compatible API for DeepSeek, Qwen, GLM, Kimi, and MiniMax. No KYC required. Get started in 2 minutes.

Quick start

The XinoAPI endpoint is a drop-in replacement for the OpenAI SDK. Change two lines and you're using Chinese LLMs.

1. Get your API key

Sign up at api.xinoapi.com/register. You'll get $2.00 in free credits and can create API keys from the dashboard. Keys start with xino-.

2. Install the OpenAI SDK

pip install openai

npm install openai

go get github.com/sashabaranov/go-openai

3. Send your first request

from openai import OpenAI

client = OpenAI(
  api_key="xino-your-key-here",
  base_url="https://api.xinoapi.com/v1",
)

response = client.chat.completions.create(
  model="deepseek-chat",
  messages=[
    {"role": "user", "content": "Hello!"}
  ],
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "xino-your-key-here",
  baseURL: "https://api.xinoapi.com/v1",
});

const response = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

package main

import (
  "context"
  "fmt"
  openai "github.com/sashabaranov/go-openai"
)

func main() {
  config := openai.DefaultConfig("xino-your-key-here")
  config.BaseURL = "https://api.xinoapi.com/v1"
  client := openai.NewClientWithConfig(config)

  resp, _ := client.CreateChatCompletion(
    context.Background(),
    openai.ChatCompletionRequest{
      Model: "deepseek-chat",
      Messages: []openai.ChatCompletionMessage{
        {Role: "user", Content: "Hello!"},
      },
    },
  )
  fmt.Println(resp.Choices[0].Message.Content)
}

curl https://api.xinoapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer xino-your-key-here" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

That's it

Your first response should come back in under 200ms. If you see an authentication error, double-check your API key in the dashboard.

Authentication

All requests require a bearer token in the Authorization header:

Authorization: Bearer xino-your-key-here

You can create and revoke keys at any time from api.xinoapi.com/token. Each key has its own usage quota and can be restricted to specific models.

Security

Never commit API keys to git. Use environment variables: os.environ["XINOAPI_KEY"] in Python, process.env.XINOAPI_KEY in Node.js.

Available models

Pass the model ID as the model parameter. Full pricing at /pricing.

DeepSeek

deepseek-v4-flash 1M · popular, 384K output

deepseek-v4-pro 1M · flagship, 1.6T params

deepseek-chat 128K · legacy alias (deprecates 2026-07-24)

deepseek-reasoner 128K · legacy alias (deprecates 2026-07-24)

Alibaba Qwen

qwen3.6-plus 1M · flagship, 78.8% SWE-Bench

qwen3.6-max-preview 260K · #1 on 6 coding benchmarks

qwen-turbo 131K · fast + cheap

qwen-plus 131K · legacy, use qwen3.6-plus instead

Zhipu GLM

glm-5.1 200K · #1 on SWE-Bench Pro

glm-5 200K · 744B MoE flagship

glm-4.7 128K · 73.8% SWE-Bench, best value

glm-4.7-flash 203K · free tier

Moonshot Kimi

kimi-k2.6 256K · latest, long-context coding

kimi-k2.5 256K · 1T MoE, stable

MiniMax

MiniMax-M2.7 245K · 230B MoE, software engineering

Chat completions

POST https://api.xinoapi.com/v1/chat/completions

Standard OpenAI-compatible chat completion endpoint. All OpenAI SDK features work — system prompts, multi-turn conversations, temperature/top_p, max_tokens, stop sequences, JSON mode, function calling.

response = client.chat.completions.create(
  model="deepseek-chat",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum entanglement in 3 sentences."},
  ],
  temperature=0.7,
  max_tokens=500,
)

Streaming

Set stream=True to receive Server-Sent Events. The response streams token-by-token in the standard OpenAI format.

stream = client.chat.completions.create(
  model="deepseek-chat",
  messages=[{"role": "user", "content": "Write a haiku about code."}],
  stream=True,
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="", flush=True)

const stream = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [{ role: "user", content: "Write a haiku." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Function calling

Most XinoAPI models support OpenAI-style function calling (tools parameter). DeepSeek, Qwen, and GLM have native support. Kimi and MiniMax have partial support — test your specific use case.

tools = [{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
      },
      "required": ["city"],
    },
  },
}]

response = client.chat.completions.create(
  model="deepseek-chat",
  messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
  tools=tools,
)

tool_calls = response.choices[0].message.tool_calls

Agent security

If you're building an agent that executes tool calls, use the Privacy SDK's response scanner to detect malicious tool calls injected by compromised intermediaries. See arXiv:2604.08407 for the threat model.

Privacy SDK

The XinoAPI Privacy SDK is a drop-in replacement for the OpenAI client that adds three defense layers: PII redaction, response threat scanning, and hash-chained audit logs. Free and open source (MIT).

pip install xinoapi-privacy

from xinoapi_privacy import PrivateClient

client = PrivateClient(
  api_key="xino-your-key",
  base_url="https://api.xinoapi.com/v1",
)

# PII is redacted before sending, restored in the response.
# Response is scanned for threats. Audit log maintains hash chain.
response = client.chat.completions.create(
  model="deepseek-chat",
  messages=[{
    "role": "user",
    "content": "Email john@acme.com about Q3."
  }],
)

Verifying response signatures

Every response from XinoAPI includes an X-SB-Signature header — an HMAC-SHA256 signature of the response body. Verify this to detect tampering between the gateway and your client.

Get your signing secret from /v1/security/signing-secret:

curl https://api.xinoapi.com/v1/security/signing-secret \
  -H "Authorization: Bearer xino-your-key"

Then verify in your code:

from xinoapi_privacy.verifier import SignatureVerifier

verifier = SignatureVerifier(signing_secret="your-secret")

result = verifier.verify(
  body=response_bytes,
  timestamp=response.headers["X-SB-Timestamp"],
  signature=response.headers["X-SB-Signature"],
)

if not result.valid:
  raise SecurityError(f"Tampered: {result.reason}")

Provider Data Policies

XinoAPI is a gateway to third-party model providers. We do not train on your prompts or responses, but upstream providers operate under their own terms, data policies, and regional obligations. Do not send sensitive data unless your organization has reviewed the selected provider's policy and applied client-side redaction where appropriate.

XinoAPI defaultMetadata-only billing logs; no plaintext prompt/response retention by default

Mainland China usersRegistration, purchase, dashboard access, and API use are not permitted

DeepSeek / Qwen / GLM / Kimi / MiniMaxProvider-specific terms, data retention, training, and regional rules apply

Sensitive dataUse the Privacy SDK for local PII and secret redaction before prompts leave your environment

Error handling

XinoAPI returns standard HTTP status codes and OpenAI-compatible error bodies.

400Bad request — malformed JSON

401Invalid or missing API key

402Insufficient credits — top up to continue

422Response blocked by security scanner

429Rate limit exceeded

500Internal gateway error

502Upstream provider error

504Upstream timeout

Rate limits

Default limits per API key:

60 requests/minute across all models
1 million tokens/minute across all models
10 concurrent streaming connections per key

Rate limit info is returned in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. For higher limits, email support@xinoapi.com.

Migration from other providers

From OpenAI

# Before
client = OpenAI(api_key="sk-...")

# After — model names change
client = OpenAI(
  api_key="xino-...",
  base_url="https://api.xinoapi.com/v1",
)
# gpt-5        → deepseek-chat (or another model)
# gpt-5-mini   → qwen-turbo or glm-4-flash

From OpenRouter

# Before
client = OpenAI(
  api_key="sk-or-...",
  base_url="https://openrouter.ai/api/v1",
)
# Uses "provider/model" format: "deepseek/deepseek-chat"

# After
client = OpenAI(
  api_key="xino-...",
  base_url="https://api.xinoapi.com/v1",
)
# Uses bare model name: "deepseek-chat"

From direct DeepSeek

# Before — Chinese KYC required
client = OpenAI(
  api_key="sk-...",
  base_url="https://api.deepseek.com",
)

# After — no KYC, plus 4 other providers available
client = OpenAI(
  api_key="xino-...",
  base_url="https://api.xinoapi.com/v1",
)
# Model names unchanged: deepseek-chat, deepseek-reasoner

Frequently asked questions

How do I access DeepSeek from outside China?

Use XinoAPI as a proxy. It accepts international credit cards, requires no Chinese phone verification, and routes DeepSeek requests from Singapore with 180ms TTFT. Sign up at api.xinoapi.com/register, use base_url="https://api.xinoapi.com/v1" in the OpenAI SDK.

Is the XinoAPI endpoint OpenAI SDK compatible?

Yes. The /v1/chat/completions endpoint accepts the exact same request format as OpenAI and returns the same response shape. Function calling, streaming (SSE), JSON mode, and logit_bias all work identically.

What model name should I use for DeepSeek V4?

deepseek-v4-flash for the general-purpose V4 model (replaces V3.2 at the same price). deepseek-v4-pro for the flagship 1.6T-parameter model with Claude Opus-level performance. The legacy names deepseek-chat and deepseek-reasoner still work as aliases of V4-Flash but will be removed on 2026-07-24 — migrate to the explicit V4 IDs now.

Which models support function calling?

DeepSeek (all), Qwen (plus/turbo/max), GLM-4 (plus/flash), Kimi K2.5. MiniMax M2.7 has partial support. All use the standard OpenAI tools parameter format.

How do I stream responses?

Add stream=True (Python) or stream: true (Node.js) to your request. The response becomes an async iterator yielding OpenAI-format chunks. See the streaming section.

What's the maximum context window?

Varies by model: MiniMax M2.7 offers 245K, Kimi K2.5 offers 200K, DeepSeek/Qwen/GLM offer 128K. See available models for per-model limits.

Are there rate limits I should know about?

60 requests/minute and 1M tokens/minute per key by default. Higher limits available on request. Rate limit headers are returned with every response.

How do I handle errors?

XinoAPI uses standard HTTP status codes. For 402 (insufficient credits), prompt the user to top up. For 429 (rate limit), implement exponential backoff. For 502/504 (upstream errors), retry with a different model or wait. See the error reference.

Is my prompt data used to train models?

XinoAPI itself never trains on your data. Upstream providers have their own policies — DeepSeek, Qwen, and MiniMax offer opt-out; Kimi and GLM default to no training. Use the Privacy SDK to redact sensitive data before it reaches any upstream.

What's the SLA?

99.5% uptime target on all API endpoints. Upstream provider outages are not counted — the gateway routes around failed providers when possible. Real-time status at status.xinoapi.com.