- 26 Apr, 2026 1 commit
-
-
Tashfeen authored
* fix(server): guard errorHandler against headers-already-sent When an LLM completion errors mid-stream, the response is already flushing tokens to the client. The handler then unconditionally called res.status().json(), throwing ERR_HTTP_HEADERS_SENT and triggering a pm2 restart. Short-circuit to Express's default handler once headers have been sent so the socket closes cleanly. * feat(providers): configurable per-provider HTTP timeout OpenAICompatProvider now accepts an optional timeoutMs constructor option (default 15000ms). Cloud APIs respond well within the existing default, but locally-hosted OpenAI-compatible inference (llama.cpp, vLLM on CPU) can take 30-120s for long prompts and was being aborted mid-generation, causing the proxy to mark the key invalid. * fix(keys): add zhipu, moonshot, minimax to platform allowlist These three platforms exist in the Platform type union and have provider registrations, but were missing from the PLATFORMS array in the keys route. Without them, the addKey Zod schema rejects requests to add API keys for these providers.
-
- 25 Apr, 2026 1 commit
-
-
Tashfeen authored
Live-probed against real free-tier keys on 2026-04-25. Adds 8 models that returned 200 with content, drops the one OR :free route that 404s, and corrects two Google rate-limits whose catalog values were ~10x-50x too high. Adds: - Cloudflare: @cf/moonshotai/kimi-k2.5, @cf/qwen/qwen3-30b-a3b-fp8, @cf/deepseek-ai/deepseek-r1-distill-qwen-32b - Google preview: gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview (Pro confirmed free-tier-eligible by the free_tier_requests quota metric in 429 errors) - OpenRouter: google/gemma-4-31b-it:free, liquid/lfm-2.5-1.2b-instruct:free Removes: - openrouter/arcee-ai/trinity-large-preview:free (404 No endpoints found) Corrects: - gemini-2.5-flash and gemini-2.5-flash-lite RPD 250/1000 -> 20. Free tier now uniformly enforces 20 RPD per model per project. Updates router test rationale: gemini-3.1-pro-preview at rank 1 now outranks Groq's gpt-oss-120b (rank 6) when keys exist for both.
-
- 23 Apr, 2026 1 commit
-
-
Tashfeen authored
Google moved Pro-tier Gemini off the free tier on 2026-04-01. Cerebras added z.ai GLM-4.7 (355B) to their free tier, throttled to 10 RPM / 100 RPD with an 8192 context cap while demand stays high.
-
- 22 Apr, 2026 6 commits
-
-
Tashfeen authored
Adds a worked Python example showing the round-trip (assistant tool_calls → tool role follow-up → final answer), notes streaming support, and clarifies the pass-through vs. Gemini-translation split.
-
Tashfeen authored
Cloudflare's OpenAI-compat endpoint rejects assistant messages with content: null, even when tool_calls are present (standard OpenAI format). Added normalizeMessages() that converts null content to "" before dispatch, plus a regression test covering the null-content + tool_calls case. Also credits @moaaz12-web in README Contributors for the tool-calling PR.
-
Moaaz Siddiqui authored
-
Tashfeen authored
-
Tashfeen authored
The stack grew wider; the headline had to bow.
-
Tashfeen authored
Kimi, Gemma, M1, HF depart the stage, DeepSeek, Maverick, Ling, and GPT-OSS engage. Twenty-two new rows, four fade from view — Agentic rank reshapes what falls through.
-
- 21 Apr, 2026 1 commit
-
-
tashfeenahmed authored
Self-hosted OpenAI-compatible proxy that aggregates the free tiers of fourteen LLM providers — Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare, Zhipu, Moonshot, MiniMax — behind a single /v1/chat/completions endpoint. Server: - Express + SQLite, per-provider adapters with streaming and non-streaming support, automatic fallover on 429/5xx, per-key RPM/RPD/TPM/TPD tracking, sticky sessions for multi-turn, AES-256-GCM encrypted key storage, unified bearer-token auth, periodic health checks. Client: - React + Vite + shadcn/ui admin dashboard: keys, fallback chain (drag to reorder, color-coded per-provider monthly token budget), playground, analytics with per-provider breakdowns. Tooling: - GitHub Actions CI (server tests + client build), MIT license, README with provider-by-provider ToS review. For personal experimentation, not production.
-