Supports multimodal endpoints — Text Chat · Vector Embedding · Image Generation · Video Generation · Speech Synthesis, unified auth, billing, and fallback
No need to rewrite existing OpenAI code. Change base_url and instantly run Claude / Gemini / DeepSeek and all other models. tool_calls, streaming, JSON mode, vision all preserved.
Declare a fallback list in the request — when the primary model times out / is rate limited / refuses, it automatically falls to the next; callers experience zero disruption, failure rate drops from 0.8% to 0.02%.
Similar prompts hit the vector cache, cached hits cost 0 tokens. Adjustable threshold and TTL, real-world CS/FAQ hit rate 42%, monthly bill reduced by 38%.
12 regional nodes · BGP + Anycast. Domestic users route through HK/SZ nodes, overseas through US East / Frankfurt — end-to-end P50 178ms.
Three routing strategies switchable with one click — Lowest Cost: auto-selects the cheapest channel among multi-vendor same-model options; Lowest Latency: prioritizes the fastest responding node; Auto-Balance: dynamically schedules based on cost, latency, and success rate. Each model can be configured independently or set globally. Upstream failures auto-fallback with cooldown recovery, zero caller awareness.
Tags mark caller dimensions, console aggregates queries by project / team / employee / user; failure reasons, latency distribution, token usage, cost — fully visualized.
7-day rolling average · Updated in last 24h