Describe what you need. The system builds it — writes the instructions, tests them, picks the right model. You get a typed, production-ready function you can call from anywhere. That's it.
$ npm i aifunctions-js
click to copy
You write a prompt. It works. Then edge cases break it. You add retries, JSON parsing, validation. Two days later you have 200 lines of glue code for one function.
Next project, you need something similar. You copy-paste, it drifts. There's no contract, no tests, no way to know if it's still working. The prompt is buried in a codebase nobody else can find.
Describe what you need: input, output, a few examples. The system autonomously writes the instructions, tests them, and picks the right model. You get a function that works.
Call it from any project, any language. It's tested, typed, and stays in a shared library so your whole team can use it. When you want to be sure it's production-ready, release it with a quality gate.
You do two things: say what you want, and use the result. Everything in between is autonomous.
Say what the function should do. Add a few real examples of good and bad output. That's enough.
The system writes the instructions, derives scoring rules from your examples, and selects the right model for your cost/quality tradeoff.
Run → judge → fix → repeat. Autonomous loop against your examples until the quality threshold is met.
Call it like any function — from Node.js, over HTTP, from any language. Typed input, typed output. Done.
Every function works as a library call or a REST endpoint. No boilerplate, no prompt engineering.
import { classify, summarize, run } from "aifunctions-js/functions"; // Built-in function — one line const { categories } = await classify({ text: "I was charged twice this month.", categories: ["Billing", "Auth", "Support"], }); // Your custom function — same simplicity const { lines } = await run( "extract-invoices", { text: invoiceText } );
# Create a function POST /functions { "id": "extract-invoices", "description": "Extract line items", "scoreGate": 0.85 } # Call it — typed input, typed output POST /functions/extract-invoices/run { "input": { "text": "Invoice #1234..." } } → { result, usage: { tokens, model, latencyMs }, requestId } # Works from Python, Go, curl, anything # No SDK required
You provide a description and a few real examples. The system autonomously writes instructions, tests them, scores them, and rewrites until they pass your quality bar.
You
good & bad output
Autonomous
instructions + rules
Autonomous
score against rules
Autonomous
rewrite & retry
You
quality bar met ✓
Give it test cases and a description. It writes instructions from scratch, runs them, tests the output, and loops autonomously until your quality bar is met.
Provide a few real examples labeled good or bad — with a brief note on why. The system derives the scoring rules autonomously. You review them. Human judgment in, not AI-judges-AI.
Benchmark your function across models — or sweep temperatures on a single model. Winners are stored as profiles: run with mode: "best" or "cheapest" and the system uses the winning config automatically.
For production use, functions can go through a release pipeline. Nothing goes live without passing your quality bar.
New functions are callable immediately for testing. Responses include "draft": true so you know you're in sandbox mode. Iterate freely.
Run :validate to check quality. Schema validation confirms the output shape. Semantic scoring tests every case against your rules. Both must pass your gate. Use it in CI to fail the build.
Quality gate passes — immutable version tagged. Pinned contract, pinned model. Roll back to any previous version if something regresses. Stable forever.
One developer described what they needed and provided 8 test cases. Here's what the autonomous loop did.
Seed instruction: "Extract line items from this invoice." — one sentence. Missed currency fields, broke on multi-page invoices, inconsistent output structure.
System rewrote instructions 4 times. Generated 6 scoring rules from examples. Final instructions: 340 words, explicit about edge cases. Selected gpt-4o via raceModels (beat Sonnet by 4% on this task).
You're running production traffic through this. Here's how we make that safe.
We log requestId, function name, model, latency, and token count. We do not log your input data, output data, or API keys. The server is a stateless proxy — your payloads pass through and are never stored.
Every response includes a requestId. Enable "trace": true on any call to get the full prompt, model selection reasoning, and scores — for that request only, returned to you, not stored. Replay any request against a pinned version.
The system picks the best model based on race results. Pin a specific model when you need exact reproducibility. Every response tells you exactly which model answered. Full control when you want it, smart defaults when you don't.
The system asks you for real examples before generating scoring rules. You review and approve the rules before they're used. The evaluator is grounded in your judgment — not AI grading its own homework.
Tag any call with projectId, traceId, and custom tags. The system adds functionId automatically. Every usage response carries these back — so you can group costs by project, trace requests across systems, and filter analytics by any dimension.
Check your OpenRouter balance and generation history, or pull OpenAI usage and cost data — all from the same API. Filter by date, model, project, or function. Your provider keys, your data, proxied directly. No separate dashboards needed.
The full platform is free with your own inference key. A managed Pro tier is on the roadmap.
X-RateLimit-Remaining, X-RateLimit-Reset.
Describe what you need. The system builds it. Call it like any function. That's it.