Open source · npm install aifunctions-js

Deploy AI functions as versioned endpoints

Define a contract — input schema, output schema, quality threshold. The system autonomously writes, optimizes, and validates the instructions. Release it as a score-gated, versioned API endpoint. Call it from anywhere.

$ npm i aifunctions-js click to copy

Every AI function starts easy and ends messy

Without AI Functions

You write a prompt. It works. Then edge cases break it. You add retries, JSON parsing, validation. Two days later you have 200 lines of glue code for one function.

Next project, you need something similar. You copy-paste, it drifts. There's no contract, no test suite, no version history, no release gate. The prompt is buried in a codebase nobody else can find.

With AI Functions

Define a function contract-first: input schema, output schema, quality threshold. The system autonomously writes instructions, generates scoring rules, and optimizes until your gate is met.

Release it as a versioned, score-gated endpoint with stored test suites, pinned model selection, and rollback. Call it from any project, any language, over HTTP.

From rough idea to production endpoint

You do two things: define the contract, and release. Everything in between is autonomous.

You
1

Create

Define input/output schemas, add 3–10 real examples, set your quality threshold. Hit POST /functions and you're done.

Autonomous
2

Generate

The system writes instructions, derives scoring rules from your examples, and selects the right model tier for your cost/quality tradeoff.

Autonomous
3

Optimize

Run → judge → fix → repeat. Autonomous loop against your test suite until the score threshold is met. No hand-holding.

You
4

Release

Score gate passes. You release. Immutable version, pinned contract, pinned model. Call it over HTTP from anywhere, forever.

Contract-first. Create, call, and version over HTTP.

Every function is a REST endpoint with a typed contract. Use the npm package for Node.js, or call the API directly from any language.

Create with a contract
POST /functions
{
  "id": "extract-invoices",
  "description": "Extract line items",
  "inputSchema": {
    "type": "object",
    "properties": {
      "text": { "type": "string" }
    }
  },
  "outputSchema": {
    "type": "object",
    "properties": {
      "lines": { "type": "array" }
    }
  },
  "scoreGate": 0.85,
  "modelPolicy": "auto"
}
Call, validate, release
# Call the latest draft
POST /functions/extract-invoices/run
→ { result, usage: { tokens, model,
    latencyMs }, requestId, draft: true }

# Call a pinned release
POST /functions/extract-invoices/versions/v1/run
→ { result, usage, version: "v1" }

# Add test cases (persisted per function)
PUT  /functions/extract-invoices/test-cases

# Validate — schema + semantic scoring
POST /functions/extract-invoices:validate
→ { score: 0.93, passed: true, cases: [...] }

# Release (blocked if below scoreGate)
POST /functions/extract-invoices:release
→ { version: "v2", score: 0.93 }

You don't write instructions. The system does.

Start with a description and examples. The system autonomously writes, tests, scores, and rewrites until your threshold is met. You approve the examples — the system handles everything else.

You

Description

+ real examples

Autonomous

Generate

instructions + rules

Autonomous

Judge

score against rules

Autonomous

Fix

rewrite & improve

You

Release

threshold met ✓

generateInstructions

Give it test cases and a description. It writes instructions from scratch, runs them, judges output, and loops autonomously until your threshold is met. Results are persisted back to the content store — no manual write step.

generateJudgeRules

Provide 3–10 real examples labeled good or bad. The system derives scoring rules autonomously. You review and approve them — grounding the evaluator in human judgment, not AI-judges-AI.

raceModels

Benchmark your function across models. Find which one performs best for your specific task — not generic benchmarks. Pin the winner via modelPolicy for reproducibility and cost control.

Score-gated releases with versioning and rollback

Functions go through draft → validate → release. Nothing reaches production without passing schema validation and semantic scoring against your test suite.

📝

Draft

New functions start as drafts. Callable immediately via /run for testing, but responses include "draft": true. Iterate freely — no commitments. Test cases are persisted and versioned alongside the function.

⚖️

Validate

Run :validate to check readiness. Two layers: schema validation confirms output shape, then semantic scoring judges every test case against your rules. Both must pass your scoreGate. Use it in CI: curl POST :validate and fail the build if score is below threshold.

🚀

Release

Score gate passes → immutable version tagged in git. Pinned contract, pinned instructions, pinned model. Test suite is frozen at release. Call /versions/v1/run for the pinned release. Roll back to any previous version if something regresses.

A real function, start to finish

Here's what the autonomous loop actually looks like for a real function.

extract-invoice-lines

Released · v2

Before optimization

0.52 score (8 test cases)

Seed instruction: "Extract line items from this invoice." — one sentence. Missed currency fields, broke on multi-page invoices, returned inconsistent structures.

After 4 autonomous cycles

0.93 score (same 8 test cases)

System rewrote instructions 4 times. Generated 6 scoring rules from examples. Final instructions: 340 words, explicit about edge cases. Selected gpt-4o via raceModels (beat Claude Sonnet by 4% on this task).

Cycle 1 0.52 → generated initial rules
Cycle 2 0.71 → fixed currency handling
Cycle 3 0.85 → added multi-page logic
Cycle 4 0.93 → threshold met ✓
Released as v2, rolled back v1

Verifiable, debuggable, no surprises

You're routing production traffic through this. Here's how we make that safe.

What we log / don't log

We log requestId, function name, model, latency, token count, and score. We do not log your input data, output data, or API keys. The server is a stateless proxy — your payloads pass through and are never stored.

Verify in source →

Observability without surveillance

Every response includes a requestId for tracing. Enable "trace": true in any run to get the full prompt, model selection reasoning, and judge scores — for that request only, returned to you, not stored. Replay any request against a pinned version for debugging.

Model selection & pinning

By default, modelPolicy: "auto" picks the best model for your function based on race results. Set modelPolicy: "pin" with a specific model to lock it down. Every run response includes the exact model and version used. Full reproducibility when you need it, smart routing when you don't.

Human-anchored evaluation

The system asks you for 3–10 real examples before generating scoring rules. You review and approve the derived rules before they're used for optimization or release gating. The evaluator is anchored in your judgment — not AI-judges-AI in a loop.

Share with your team — or keep it private

Functions are backed by a git content store. You decide who gets access.

Free today. Pro when you need it.

The full platform is free with your own inference key. A managed Pro tier is on the roadmap — here's what it will include.

Free — available now
$0
Bring your own OpenRouter key. You pay inference directly. Full platform, no restrictions.
  • Full Functions API
  • Unlimited functions
  • Autonomous optimization
  • Score-gated releases
  • Versioning & rollback
  • Your own inference key
Coming later
Pro
Usage-based
We handle inference. Pay per token. No keys to manage. Plus managed infrastructure and advanced capabilities.
  • Everything in Free
  • Managed inference — no API key setup
  • Team workspaces & RBAC
  • Priority rate limits
Advanced capabilities
  • Monitoring & alerts — track score drift, latency spikes, cost anomalies per function
  • Usage dashboard — tokens, cost, and performance per function over time
  • Agentic features — chain functions, conditional routing, multi-step workflows
  • Agentic memory — persistent context across function calls for stateful workflows
Published limits (Free tier): 60 requests/min per key · 10 concurrent AI calls · 100KB max payload · 120s timeout per run. All responses include standard rate-limit headers: X-RateLimit-Remaining, X-RateLimit-Reset.

Stop burying prompts
in codebases

Define a contract. Let the system optimize autonomously. Release a versioned endpoint. Call it from anywhere.