One API
200+ LLMs Connected

100% compatible with OpenAI SDK. Change one base_url to access 200+ models instantly. Add fallback, retry, timeout, cache, and observability — all declarative parameters, zero intrusion to business code.

Get Started in 5 Min
POST /v1/chat/completions
# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://api.wanflow.ai/v1",
    api_key="wf-...",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[
        {"role": "system", "content": "You are ACME Corp's data analysis assistant."},
        {"role": "user",   "content": "Analyze this Q4 sales data and give three insights."},
    ],
    # Wanflow-specific extras —— all optional
    extra_body={
        "fallback": ["gpt-5", "gemini-2-5-pro"],   # auto-fallback on primary model failure
        "cache":    {"mode": "semantic", "ttl": 86400},
        "timeout":  30,
        "tags":     {"project": "finance", "team": "analytics"},
    },
)

print(resp.choices[0].message.content)
print("cache hit:", resp.cached, "· cost: ¥", resp.cost_cny)

Intelligent API Dispatch Layer Optimized for LLMs

Supports multimodal endpoints — Text Chat · Vector Embedding · Image Generation · Video Generation · Speech Synthesis, unified auth, billing, and fallback

Text Chat /v1/chat/completionsVector Embedding /v1/embeddingsImage Generation /v1/images/generationsVideo Generation /v1/videos/generationsSpeech Synthesis /v1/audio/speech
01SDK Compatible

100% OpenAI SDK Compatible

No need to rewrite existing OpenAI code. Change base_url and instantly run Claude / Gemini / DeepSeek and all other models. tool_calls, streaming, JSON mode, vision all preserved.

02Fallback

Multi-Model Auto Fallback

Declare a fallback list in the request — when the primary model times out / is rate limited / refuses, it automatically falls to the next; callers experience zero disruption, failure rate drops from 0.8% to 0.02%.

03Cache

Semantic Caching

Similar prompts hit the vector cache, cached hits cost 0 tokens. Adjustable threshold and TTL, real-world CS/FAQ hit rate 42%, monthly bill reduced by 38%.

04Routing

Global Proximity Routing

12 regional nodes · BGP + Anycast. Domestic users route through HK/SZ nodes, overseas through US East / Frankfurt — end-to-end P50 178ms.

05Smart Routing

Cost · Latency · Availability 3D Routing

Three routing strategies switchable with one click — Lowest Cost: auto-selects the cheapest channel among multi-vendor same-model options; Lowest Latency: prioritizes the fastest responding node; Auto-Balance: dynamically schedules based on cost, latency, and success rate. Each model can be configured independently or set globally. Upstream failures auto-fallback with cooldown recovery, zero caller awareness.

06Observability

Every Call Is Traceable

Tags mark caller dimensions, console aggregates queries by project / team / employee / user; failure reasons, latency distribution, token usage, cost — fully visualized.

Global Nodes · Real Latency & Availability.

7-day rolling average · Updated in last 24h

NodeRegion30-Day StatusTTFT P50TTFT P99Full Request P50Uptime 30dStatus
cn-north-1
Beijing
Mainland China
142 ms418 ms1.20 s99.997%Healthy
cn-south-1
Shenzhen
Mainland China
138 ms402 ms1.18 s99.999%Healthy
hk-1
Hong Kong
Hong Kong
156 ms442 ms1.32 s99.994%Healthy
sg-southeast-1
Singapore
Southeast Asia
168 ms468 ms1.38 s99.998%Healthy
jp-tokyo-1
Tokyo
Japan
152 ms432 ms1.26 s99.996%Healthy
us-east-1
Virginia
US East
184 ms488 ms1.42 s99.999%Healthy
us-west-2
Oregon
US West
192 ms504 ms1.48 s99.995%Maintenance
eu-central-1
Frankfurt
Central Europe
176 ms456 ms1.34 s99.998%Healthy
eu-west-1
Ireland
Western Europe
182 ms476 ms1.40 s99.997%Healthy
au-sydney-1
Sydney
Oceania
198 ms518 ms1.52 s99.992%Healthy
View full status page →· Real-time latency data integrated with Datadog and Prometheus