Define a contract — input schema, output schema, quality threshold. The system autonomously writes, optimizes, and validates the instructions. Release it as a score-gated, versioned API endpoint. Call it from anywhere.
$ npm i aifunctions-js
click to copy
You write a prompt. It works. Then edge cases break it. You add retries, JSON parsing, validation. Two days later you have 200 lines of glue code for one function.
Next project, you need something similar. You copy-paste, it drifts. There's no contract, no test suite, no version history, no release gate. The prompt is buried in a codebase nobody else can find.
Define a function contract-first: input schema, output schema, quality threshold. The system autonomously writes instructions, generates scoring rules, and optimizes until your gate is met.
Release it as a versioned, score-gated endpoint with stored test suites, pinned model selection, and rollback. Call it from any project, any language, over HTTP.
You do two things: define the contract, and release. Everything in between is autonomous.
Define input/output schemas, add 3–10 real examples, set your quality threshold. Hit POST /functions and you're done.
The system writes instructions, derives scoring rules from your examples, and selects the right model tier for your cost/quality tradeoff.
Run → judge → fix → repeat. Autonomous loop against your test suite until the score threshold is met. No hand-holding.
Score gate passes. You release. Immutable version, pinned contract, pinned model. Call it over HTTP from anywhere, forever.
Every function is a REST endpoint with a typed contract. Use the npm package for Node.js, or call the API directly from any language.
POST /functions { "id": "extract-invoices", "description": "Extract line items", "inputSchema": { "type": "object", "properties": { "text": { "type": "string" } } }, "outputSchema": { "type": "object", "properties": { "lines": { "type": "array" } } }, "scoreGate": 0.85, "modelPolicy": "auto" }
# Call the latest draft POST /functions/extract-invoices/run → { result, usage: { tokens, model, latencyMs }, requestId, draft: true } # Call a pinned release POST /functions/extract-invoices/versions/v1/run → { result, usage, version: "v1" } # Add test cases (persisted per function) PUT /functions/extract-invoices/test-cases # Validate — schema + semantic scoring POST /functions/extract-invoices:validate → { score: 0.93, passed: true, cases: [...] } # Release (blocked if below scoreGate) POST /functions/extract-invoices:release → { version: "v2", score: 0.93 }
Start with a description and examples. The system autonomously writes, tests, scores, and rewrites until your threshold is met. You approve the examples — the system handles everything else.
You
+ real examples
Autonomous
instructions + rules
Autonomous
score against rules
Autonomous
rewrite & improve
You
threshold met ✓
Give it test cases and a description. It writes instructions from scratch, runs them, judges output, and loops autonomously until your threshold is met. Results are persisted back to the content store — no manual write step.
Provide 3–10 real examples labeled good or bad. The system derives scoring rules autonomously. You review and approve them — grounding the evaluator in human judgment, not AI-judges-AI.
Benchmark your function across models. Find which one performs best for your specific task — not generic benchmarks. Pin the winner via modelPolicy for reproducibility and cost control.
Functions go through draft → validate → release. Nothing reaches production without passing schema validation and semantic scoring against your test suite.
New functions start as drafts. Callable immediately via /run for testing, but responses include "draft": true. Iterate freely — no commitments. Test cases are persisted and versioned alongside the function.
Run :validate to check readiness. Two layers: schema validation confirms output shape, then semantic scoring judges every test case against your rules. Both must pass your scoreGate. Use it in CI: curl POST :validate and fail the build if score is below threshold.
Score gate passes → immutable version tagged in git. Pinned contract, pinned instructions, pinned model. Test suite is frozen at release. Call /versions/v1/run for the pinned release. Roll back to any previous version if something regresses.
Here's what the autonomous loop actually looks like for a real function.
Seed instruction: "Extract line items from this invoice." — one sentence. Missed currency fields, broke on multi-page invoices, returned inconsistent structures.
System rewrote instructions 4 times. Generated 6 scoring rules from examples. Final instructions: 340 words, explicit about edge cases. Selected gpt-4o via raceModels (beat Claude Sonnet by 4% on this task).
You're routing production traffic through this. Here's how we make that safe.
We log requestId, function name, model, latency, token count, and score. We do not log your input data, output data, or API keys. The server is a stateless proxy — your payloads pass through and are never stored.
Every response includes a requestId for tracing. Enable "trace": true in any run to get the full prompt, model selection reasoning, and judge scores — for that request only, returned to you, not stored. Replay any request against a pinned version for debugging.
By default, modelPolicy: "auto" picks the best model for your function based on race results. Set modelPolicy: "pin" with a specific model to lock it down. Every run response includes the exact model and version used. Full reproducibility when you need it, smart routing when you don't.
The system asks you for 3–10 real examples before generating scoring rules. You review and approve the derived rules before they're used for optimization or release gating. The evaluator is anchored in your judgment — not AI-judges-AI in a loop.
The full platform is free with your own inference key. A managed Pro tier is on the roadmap — here's what it will include.
X-RateLimit-Remaining, X-RateLimit-Reset.
Define a contract. Let the system optimize autonomously. Release a versioned endpoint. Call it from anywhere.