Zero‑cost AI stack: how we run 33 agents for under $10/month
Zero‑cost AI stack: how we run 33 agents for under $10/month
We built a machine‑learning backbone that scales to 33 reasoning agents without ever touching our payroll. The secret? No big‑budget ML infra, a few cheap APIs, and a lean architecture that cuts the fray to only the things that matter.
1. Cloud‑native, server‑less sprint
Everything lives on Vercel and Supabase. We use Next.js as the front‑end, and the whole API layer is a set of serverless functions. No VMs, no Kubernetes. Deploy a new function and you hit the same 10‑second cold‑start window each time. That keeps our monthly bill billable only when the bells actually ring.
We plug every request into the same Supabase database. Permissions are handled by Row Level Security (RLS). That means each agent can only see the rows it needs, no extra logic layers, no extra cost.
2. Inference on a cheap, scale‑up chip
We run every LLM inference on a Groq edge chip via the cloud. That costs 5 cents per 1 k tokens. We batch the calls in 128‑token chunks and reuse the same context as much as possible. The Groq API is generous; we never hit throttling because we keep the batch size low and cache the embeddings.
Gemini, our multimodal backbone, is called only for vision‑heavy or user‑specific tasks. We keep that call to once per user session. The rest of the stack is pure text, so it lives entirely on Groq.
3. Tiny pipelines, big ideas
Each agent is a encapsulated pipeline:
1. Agent A – policy checker: a prompt that reads a policy file in Supabase, runs a minimal 200‑token LLM query, and returns a boolean verdict.
2. Agent B – summarizer: fetches recent 10‑minute logs, slices them, and produces a 100‑token summary.
3. Agent C – recommender: pulls user history, computes cosine similarity over cached embeddings, returns the top 3 suggestions.
The agent JSON schema gives us version control and automated testing. Adding a new agent is just creating a new serverless function and a minimal wrapper. Runtime stays under 200 ms, so our latency budget is satisfied.
4. Real‑world case: IOTA integration platform
The Hive built an internal product, AppX, to manage IOTA node staking. The platform manages 546 nodes, each needing a health‑check, reward‑prediction, and migration script. We ran 33 agents across those responsibilities. The budget? Under $8.70/month.
How the stack worked
The result? Developers could spawn new agents or tweak prompts with a Git‑push. No costly retraining cycles ever entered the conversation.
Unlock your own ultra‑low‑cost AI stack
You own the schema, you host your front‑end, you pay for the inference you actually use. Reach out. Drop us a line at hello@the-hive-iota.vercel.app or test our prototype at the-hive-iota.vercel.app. We’re giving away a 30‑minute walkthrough. Our free plan never touches a credit card. The stack is ready for you.
Also published on
Built by The Hive
Need this built for your company?
The same AI-powered workflows behind this article — applied to your product. Next.js, Flutter, Node.js, AI integration. Fixed price, shipped in weeks.
Start a project →