One API
200+ LLMs Connected

100% compatible with OpenAI SDK. Change one base_url to access 200+ models instantly. Add fallback, retry, timeout, cache, and observability — all declarative parameters, zero intrusion to business code.

Get Started in 5 Min

POST /v1/chat/completions

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://api.wanflow.ai/v1",
    api_key="wf-...",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[
        {"role": "system", "content": "You are ACME Corp's data analysis assistant."},
        {"role": "user",   "content": "Analyze this Q4 sales data and give three insights."},
    ],
    # Wanflow-specific extras —— all optional
    extra_body={
        "fallback": ["gpt-5", "gemini-2-5-pro"],   # auto-fallback on primary model failure
        "cache":    {"mode": "semantic", "ttl": 86400},
        "timeout":  30,
        "tags":     {"project": "finance", "team": "analytics"},
    },
)

print(resp.choices[0].message.content)
print("cache hit:", resp.cached, "· cost: ¥", resp.cost_cny)

// npm i openai
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.wanflow.ai/v1",
  apiKey: process.env.WANFLOW_KEY,
});

const resp = await client.chat.completions.create({
  model: "claude-opus-4-8",
  messages: [
    { role: "system", content: "You are ACME Corp's data analysis assistant." },
    { role: "user",   content: "Analyze this Q4 sales data and give three insights." },
  ],
  fallback: ["gpt-5", "gemini-2-5-pro"],
  cache:    { mode: "semantic", ttl: 86400 },
  timeout:  30,
  tags:     { project: "finance", team: "analytics" },
});

console.log(resp.choices[0].message.content);

curl "https://api.wanflow.ai/v1/chat/completions" \
  -H "Authorization: Bearer $WANFLOW_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "messages": [
      {"role": "system", "content": "You are ACME Corp's data analysis assistant."},
      {"role": "user",   "content": "Analyze this Q4 sales data and give three insights."}
    ],
    "fallback": ["gpt-5", "gemini-2-5-pro"],
    "cache":    {"mode": "semantic", "ttl": 86400},
    "timeout":  30,
    "tags":     {"project": "finance", "team": "analytics"}
  }'

package main

import (
    "context"
    "fmt"
    "github.com/sashabaranov/go-openai"
)

func main() {
    cfg := openai.DefaultConfig("wf-...")
    cfg.BaseURL = "https://api.wanflow.ai/v1"
    client := openai.NewClientWithConfig(cfg)

    resp, _ := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "claude-opus-4-8",
            Messages: []openai.ChatCompletionMessage{
                {Role: "user", Content: "Analyze Q4 sales data"},
            },
        },
    )
    fmt.Println(resp.Choices[0].Message.Content)
}

Intelligent API Dispatch Layer Optimized for LLMs

Supports multimodal endpoints — Text Chat · Vector Embedding · Image Generation · Video Generation · Speech Synthesis, unified auth, billing, and fallback

Text Chat /v1/chat/completionsVector Embedding /v1/embeddingsImage Generation /v1/images/generationsVideo Generation /v1/videos/generationsSpeech Synthesis /v1/audio/speech

01SDK Compatible

100% OpenAI SDK Compatible

No need to rewrite existing OpenAI code. Change base_url and instantly run Claude / Gemini / DeepSeek and all other models. tool_calls, streaming, JSON mode, vision all preserved.

02Fallback

Multi-Model Auto Fallback

Declare a fallback list in the request — when the primary model times out / is rate limited / refuses, it automatically falls to the next; callers experience zero disruption, failure rate drops from 0.8% to 0.02%.

03Cache

Semantic Caching

Similar prompts hit the vector cache, cached hits cost 0 tokens. Adjustable threshold and TTL, real-world CS/FAQ hit rate 42%, monthly bill reduced by 38%.

04Routing

Global Proximity Routing

12 regional nodes · BGP + Anycast. Domestic users route through HK/SZ nodes, overseas through US East / Frankfurt — end-to-end P50 178ms.

05Smart Routing

Cost · Latency · Availability 3D Routing

Three routing strategies switchable with one click — Lowest Cost: auto-selects the cheapest channel among multi-vendor same-model options; Lowest Latency: prioritizes the fastest responding node; Auto-Balance: dynamically schedules based on cost, latency, and success rate. Each model can be configured independently or set globally. Upstream failures auto-fallback with cooldown recovery, zero caller awareness.

06Observability

Every Call Is Traceable

Tags mark caller dimensions, console aggregates queries by project / team / employee / user; failure reasons, latency distribution, token usage, cost — fully visualized.

Global Nodes · Real Latency & Availability.

7-day rolling average · Updated in last 24h

Node	Region	TTFT P50	TTFT P99	Full Request P50	Uptime 30d	Status
cn-north-1 Beijing	Mainland China	142 ms	418 ms	1.20 s	99.997%	Healthy
cn-south-1 Shenzhen	Mainland China	138 ms	402 ms	1.18 s	99.999%	Healthy
hk-1 Hong Kong	Hong Kong	156 ms	442 ms	1.32 s	99.994%	Healthy
sg-southeast-1 Singapore	Southeast Asia	168 ms	468 ms	1.38 s	99.998%	Healthy
jp-tokyo-1 Tokyo	Japan	152 ms	432 ms	1.26 s	99.996%	Healthy
us-east-1 Virginia	US East	184 ms	488 ms	1.42 s	99.999%	Healthy
us-west-2 Oregon	US West	192 ms	504 ms	1.48 s	99.995%	Maintenance
eu-central-1 Frankfurt	Central Europe	176 ms	456 ms	1.34 s	99.998%	Healthy
eu-west-1 Ireland	Western Europe	182 ms	476 ms	1.40 s	99.997%	Healthy
au-sydney-1 Sydney	Oceania	198 ms	518 ms	1.52 s	99.992%	Healthy

View full status page →· Real-time latency data integrated with Datadog and Prometheus

One API200+ LLMs Connected