- 25 Apr, 2026 1 commit
-
-
Tashfeen authored
Live-probed against real free-tier keys on 2026-04-25. Adds 8 models that returned 200 with content, drops the one OR :free route that 404s, and corrects two Google rate-limits whose catalog values were ~10x-50x too high. Adds: - Cloudflare: @cf/moonshotai/kimi-k2.5, @cf/qwen/qwen3-30b-a3b-fp8, @cf/deepseek-ai/deepseek-r1-distill-qwen-32b - Google preview: gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview (Pro confirmed free-tier-eligible by the free_tier_requests quota metric in 429 errors) - OpenRouter: google/gemma-4-31b-it:free, liquid/lfm-2.5-1.2b-instruct:free Removes: - openrouter/arcee-ai/trinity-large-preview:free (404 No endpoints found) Corrects: - gemini-2.5-flash and gemini-2.5-flash-lite RPD 250/1000 -> 20. Free tier now uniformly enforces 20 RPD per model per project. Updates router test rationale: gemini-3.1-pro-preview at rank 1 now outranks Groq's gpt-oss-120b (rank 6) when keys exist for both.
-
- 23 Apr, 2026 1 commit
-
-
Tashfeen authored
Google moved Pro-tier Gemini off the free tier on 2026-04-01. Cerebras added z.ai GLM-4.7 (355B) to their free tier, throttled to 10 RPM / 100 RPD with an 8192 context cap while demand stays high.
-
- 22 Apr, 2026 6 commits
-
-
Tashfeen authored
Adds a worked Python example showing the round-trip (assistant tool_calls → tool role follow-up → final answer), notes streaming support, and clarifies the pass-through vs. Gemini-translation split.
-
Tashfeen authored
Cloudflare's OpenAI-compat endpoint rejects assistant messages with content: null, even when tool_calls are present (standard OpenAI format). Added normalizeMessages() that converts null content to "" before dispatch, plus a regression test covering the null-content + tool_calls case. Also credits @moaaz12-web in README Contributors for the tool-calling PR.
-
Moaaz Siddiqui authored
-
Tashfeen authored
-
Tashfeen authored
The stack grew wider; the headline had to bow.
-
Tashfeen authored
Kimi, Gemma, M1, HF depart the stage, DeepSeek, Maverick, Ling, and GPT-OSS engage. Twenty-two new rows, four fade from view — Agentic rank reshapes what falls through.
-
- 21 Apr, 2026 1 commit
-
-
tashfeenahmed authored
Self-hosted OpenAI-compatible proxy that aggregates the free tiers of fourteen LLM providers — Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare, Zhipu, Moonshot, MiniMax — behind a single /v1/chat/completions endpoint. Server: - Express + SQLite, per-provider adapters with streaming and non-streaming support, automatic fallover on 429/5xx, per-key RPM/RPD/TPM/TPD tracking, sticky sessions for multi-turn, AES-256-GCM encrypted key storage, unified bearer-token auth, periodic health checks. Client: - React + Vite + shadcn/ui admin dashboard: keys, fallback chain (drag to reorder, color-coded per-provider monthly token budget), playground, analytics with per-provider breakdowns. Tooling: - GitHub Actions CI (server tests + client build), MIT license, README with provider-by-provider ToS review. For personal experimentation, not production.
-