Commits · 694d75cf7ca6b3d334a3df190589d3644198ab45 · Vũ Hoàng Anh / freellmapi

26 Apr, 2026 1 commit

fix(server): error handler crash + missing platforms + per-provider timeout (#7) · 694d75cf

Tashfeen authored Apr 26, 2026

* fix(server): guard errorHandler against headers-already-sent

When an LLM completion errors mid-stream, the response is already
flushing tokens to the client. The handler then unconditionally called
res.status().json(), throwing ERR_HTTP_HEADERS_SENT and triggering a
pm2 restart. Short-circuit to Express's default handler once headers
have been sent so the socket closes cleanly.

* feat(providers): configurable per-provider HTTP timeout

OpenAICompatProvider now accepts an optional timeoutMs constructor
option (default 15000ms). Cloud APIs respond well within the existing
default, but locally-hosted OpenAI-compatible inference (llama.cpp,
vLLM on CPU) can take 30-120s for long prompts and was being aborted
mid-generation, causing the proxy to mark the key invalid.

* fix(keys): add zhipu, moonshot, minimax to platform allowlist

These three platforms exist in the Platform type union and have
provider registrations, but were missing from the PLATFORMS array
in the keys route. Without them, the addKey Zod schema rejects
requests to add API keys for these providers.

694d75cf

25 Apr, 2026 1 commit

feat(catalog): migrateModelsV6 — probe-verified additions and Google RPD fix (#6) · fbb2a175

Tashfeen authored Apr 25, 2026

Live-probed against real free-tier keys on 2026-04-25. Adds 8 models
that returned 200 with content, drops the one OR :free route that
404s, and corrects two Google rate-limits whose catalog values were
~10x-50x too high.

Adds:
- Cloudflare: @cf/moonshotai/kimi-k2.5, @cf/qwen/qwen3-30b-a3b-fp8,
  @cf/deepseek-ai/deepseek-r1-distill-qwen-32b
- Google preview: gemini-3-flash-preview, gemini-3.1-flash-lite-preview,
  gemini-3.1-pro-preview (Pro confirmed free-tier-eligible by the
  free_tier_requests quota metric in 429 errors)
- OpenRouter: google/gemma-4-31b-it:free, liquid/lfm-2.5-1.2b-instruct:free

Removes:
- openrouter/arcee-ai/trinity-large-preview:free (404 No endpoints found)

Corrects:
- gemini-2.5-flash and gemini-2.5-flash-lite RPD 250/1000 -> 20.
  Free tier now uniformly enforces 20 RPD per model per project.

Updates router test rationale: gemini-3.1-pro-preview at rank 1 now
outranks Groq's gpt-oss-120b (rank 6) when keys exist for both.

fbb2a175

23 Apr, 2026 1 commit

feat(catalog): disable Gemini 2.5 Pro, add Cerebras zai-glm-4.7 · 73698e65

Tashfeen authored Apr 23, 2026

Google moved Pro-tier Gemini off the free tier on 2026-04-01. Cerebras
added z.ai GLM-4.7 (355B) to their free tier, throttled to 10 RPM /
100 RPD with an 8192 context cap while demand stays high.

73698e65

22 Apr, 2026 6 commits

docs: document tool calling in the Using the API section · e1a5b9f1

Tashfeen authored Apr 22, 2026

Adds a worked Python example showing the round-trip (assistant tool_calls →
tool role follow-up → final answer), notes streaming support, and clarifies
the pass-through vs. Gemini-translation split.

e1a5b9f1

fix(cloudflare): normalize null content to empty string for tool round-trips · 4c149373

Tashfeen authored Apr 22, 2026

Cloudflare's OpenAI-compat endpoint rejects assistant messages with
content: null, even when tool_calls are present (standard OpenAI format).
Added normalizeMessages() that converts null content to "" before dispatch,
plus a regression test covering the null-content + tool_calls case.

Also credits @moaaz12-web in README Contributors for the tool-calling PR.

4c149373

feat: tool-calling support across providers (#3) · 1a80b0a1
Moaaz Siddiqui authored Apr 22, 2026

1a80b0a1
Refresh fallback-chain diagram to match the new catalog. · 19beb5cb
Tashfeen authored Apr 22, 2026

19beb5cb
Eight hundred million? No — one-point-three billion now. · 4e8dcf6a
Tashfeen authored Apr 22, 2026
```
The stack grew wider; the headline had to bow.
```
4e8dcf6a

Live-probed each tier; culled the silent tools. · 0262b1c6

Tashfeen authored Apr 22, 2026

Kimi, Gemma, M1, HF depart the stage,
DeepSeek, Maverick, Ling, and GPT-OSS engage.
Twenty-two new rows, four fade from view —
Agentic rank reshapes what falls through.

0262b1c6

21 Apr, 2026 1 commit

Initial release of FreeLLMAPI · 04e15037

tashfeenahmed authored Apr 21, 2026

Self-hosted OpenAI-compatible proxy that aggregates the free tiers of
fourteen LLM providers — Google, Groq, Cerebras, SambaNova, NVIDIA,
Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare,
Zhipu, Moonshot, MiniMax — behind a single /v1/chat/completions endpoint.

Server:
- Express + SQLite, per-provider adapters with streaming and non-streaming
  support, automatic fallover on 429/5xx, per-key RPM/RPD/TPM/TPD tracking,
  sticky sessions for multi-turn, AES-256-GCM encrypted key storage,
  unified bearer-token auth, periodic health checks.

Client:
- React + Vite + shadcn/ui admin dashboard: keys, fallback chain (drag
  to reorder, color-coded per-provider monthly token budget), playground,
  analytics with per-provider breakdowns.

Tooling:
- GitHub Actions CI (server tests + client build), MIT license,
  README with provider-by-provider ToS review.

For personal experimentation, not production.

04e15037