1. 26 Apr, 2026 3 commits
    • Hoanganhvu123's avatar
    • Hoanganhvu123's avatar
    • Tashfeen's avatar
      fix(server): error handler crash + missing platforms + per-provider timeout (#7) · 694d75cf
      Tashfeen authored
      * fix(server): guard errorHandler against headers-already-sent
      
      When an LLM completion errors mid-stream, the response is already
      flushing tokens to the client. The handler then unconditionally called
      res.status().json(), throwing ERR_HTTP_HEADERS_SENT and triggering a
      pm2 restart. Short-circuit to Express's default handler once headers
      have been sent so the socket closes cleanly.
      
      * feat(providers): configurable per-provider HTTP timeout
      
      OpenAICompatProvider now accepts an optional timeoutMs constructor
      option (default 15000ms). Cloud APIs respond well within the existing
      default, but locally-hosted OpenAI-compatible inference (llama.cpp,
      vLLM on CPU) can take 30-120s for long prompts and was being aborted
      mid-generation, causing the proxy to mark the key invalid.
      
      * fix(keys): add zhipu, moonshot, minimax to platform allowlist
      
      These three platforms exist in the Platform type union and have
      provider registrations, but were missing from the PLATFORMS array
      in the keys route. Without them, the addKey Zod schema rejects
      requests to add API keys for these providers.
      694d75cf
  2. 25 Apr, 2026 1 commit
    • Tashfeen's avatar
      feat(catalog): migrateModelsV6 — probe-verified additions and Google RPD fix (#6) · fbb2a175
      Tashfeen authored
      Live-probed against real free-tier keys on 2026-04-25. Adds 8 models
      that returned 200 with content, drops the one OR :free route that
      404s, and corrects two Google rate-limits whose catalog values were
      ~10x-50x too high.
      
      Adds:
      - Cloudflare: @cf/moonshotai/kimi-k2.5, @cf/qwen/qwen3-30b-a3b-fp8,
        @cf/deepseek-ai/deepseek-r1-distill-qwen-32b
      - Google preview: gemini-3-flash-preview, gemini-3.1-flash-lite-preview,
        gemini-3.1-pro-preview (Pro confirmed free-tier-eligible by the
        free_tier_requests quota metric in 429 errors)
      - OpenRouter: google/gemma-4-31b-it:free, liquid/lfm-2.5-1.2b-instruct:free
      
      Removes:
      - openrouter/arcee-ai/trinity-large-preview:free (404 No endpoints found)
      
      Corrects:
      - gemini-2.5-flash and gemini-2.5-flash-lite RPD 250/1000 -> 20.
        Free tier now uniformly enforces 20 RPD per model per project.
      
      Updates router test rationale: gemini-3.1-pro-preview at rank 1 now
      outranks Groq's gpt-oss-120b (rank 6) when keys exist for both.
      fbb2a175
  3. 23 Apr, 2026 1 commit
  4. 22 Apr, 2026 6 commits
  5. 21 Apr, 2026 1 commit
    • tashfeenahmed's avatar
      Initial release of FreeLLMAPI · 04e15037
      tashfeenahmed authored
      Self-hosted OpenAI-compatible proxy that aggregates the free tiers of
      fourteen LLM providers — Google, Groq, Cerebras, SambaNova, NVIDIA,
      Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare,
      Zhipu, Moonshot, MiniMax — behind a single /v1/chat/completions endpoint.
      
      Server:
      - Express + SQLite, per-provider adapters with streaming and non-streaming
        support, automatic fallover on 429/5xx, per-key RPM/RPD/TPM/TPD tracking,
        sticky sessions for multi-turn, AES-256-GCM encrypted key storage,
        unified bearer-token auth, periodic health checks.
      
      Client:
      - React + Vite + shadcn/ui admin dashboard: keys, fallback chain (drag
        to reorder, color-coded per-provider monthly token budget), playground,
        analytics with per-provider breakdowns.
      
      Tooling:
      - GitHub Actions CI (server tests + client build), MIT license,
        README with provider-by-provider ToS review.
      
      For personal experimentation, not production.
      04e15037