@@ -69,6 +69,7 @@ The problem is that stacking them by hand is painful: fourteen different SDKs, f
-**OpenAI-compatible** — `POST /v1/chat/completions` and `GET /v1/models` work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change `base_url`.
-**Streaming and non-streaming** — Server-Sent Events for `stream: true`, JSON response otherwise. Every provider adapter implements both.
-**Tool calling** — OpenAI-style `tools` / `tool_choice` requests are passed through, and assistant `tool_calls` + `tool` role follow-up messages round-trip across providers.
-**Automatic fallover** — If the chosen provider returns a 429, 5xx, or times out, the router skips it, puts the key on a short cooldown, and retries on the next model in your fallback chain (up to 20 attempts).
-**Per-key rate tracking** — RPM, RPD, TPM, and TPD counters per `(platform, model, key)` so the router always picks a key that's under its caps.
-**Sticky sessions** — Multi-turn conversations keep talking to the same model for 30 minutes to avoid the hallucination spike that comes from mid-conversation model switches.
...
...
@@ -86,7 +87,6 @@ The scope is deliberately narrow. If a feature isn't on this list and isn't belo
-**Embeddings** (`/v1/embeddings`)
-**Image generation** (`/v1/images/*`)
-**Audio / speech** (`/v1/audio/*`)
-**Function / tool calling** — the request schema doesn't pass `tools` through yet
-**Vision / multimodal inputs** — message content is text-only
-**Legacy completions** (`/v1/completions`) — only the chat endpoint is implemented