Vorana speaks OpenAI's wire format and adds reliability, cost control, and audit on top. Five minutes from API key to first call. Stay in your stack — we won't make you switch.
For local dev, use vorana-dev-key against the gateway on localhost:5057. Production keys are minted from the admin portal per tenant.
The .NET SDK adds typed requests, options, and Entra OIDC auth. Or skip it — the API is plain HTTP and works from any language.
Pick a pipeline, send your inputs, get a response with confidence, citations, and a run_id you can audit later.
# Local dev — gateway runs on :5057, default key is `vorana-dev-key`
curl -X POST http://localhost:5057/v1/generate \
-H "X-Api-Key: vorana-dev-key" \
-H "Content-Type: application/json" \
-d '{
"pipeline_id": "pipeline.faq_assistant.v1",
"inputs": { "user_query": "What is our refund policy?" },
"options": { "mode": "balanced" }
}'
using Vorana.Client;
using Vorana.Shared;
// In Program.cs / Startup
services.AddVoranaClient(opts =>
{
opts.BaseAddress = new Uri("http://localhost:5057");
opts.ApiKey = "vorana-dev-key";
opts.DefaultPipelineId = "pipeline.faq_assistant.v1";
});
// Anywhere IVoranaClient is injected
var resp = await client.GenerateAsync(new GenerateRequest
{
Inputs = new() { ["user_query"] = "What is our refund policy?" }
});
Console.WriteLine(resp.Output["content"]);
Console.WriteLine($"run_id={resp.RunId}, confidence={resp.Confidence}");
const res = await fetch('https://gateway.vorana.ai/v1/generate', {
method: 'POST',
headers: {
'X-Api-Key': process.env.VORANA_API_KEY!,
'Content-Type': 'application/json',
},
body: JSON.stringify({
pipeline_id: 'pipeline.faq_assistant.v1',
inputs: { user_query: "What is our refund policy?" },
}),
});
const data = await res.json();
console.log(data.output.content, data.confidence, data.run_id);
import httpx, os
resp = httpx.post(
"https://gateway.vorana.ai/v1/generate",
headers={"X-Api-Key": os.environ["VORANA_API_KEY"]},
json={
"pipeline_id": "pipeline.faq_assistant.v1",
"inputs": {"user_query": "What is our refund policy?"},
},
).json()
print(resp["output"]["content"], resp["confidence"], resp["run_id"])
Send X-Api-Key: <key> on every request. The key maps to a tenant_id + app_id.
Optionally override the tenant with X-Tenant-Id.
X-Api-Key: vorana-dev-key
X-Tenant-Id: tenant_acme // optional
Content-Type: application/json
When Auth:Entra:Authority is set on the gateway, send a bearer token instead.
The .NET SDK fetches and refreshes tokens automatically via TokenCredential.
services.AddVoranaClient(opts =>
{
opts.BaseAddress = new Uri("https://gateway.vorana.ai");
opts.Credential = new DefaultAzureCredential();
opts.Scope = "api://vorana/.default";
});
POST /v1/generate takes a pipeline_id and your inputs.
Vorana runs validation, retrieval, generation, scoring, and policy — then returns
the answer along with the decisions, citations, metrics, and a run_id you can replay.
{
"pipeline_id": "pipeline.faq_assistant.v1",
"policy_id": null,
"inputs": {
"user_query": "What's our refund window for purchases over $500?"
},
"options": {
"mode": "balanced", // fast | balanced | strict
"max_tokens": 800,
"trace": true
}
}
{
"run_id": "r_8f3a2c…",
"trace_id": "01J5HQ…",
"status": "ok", // ok | low_confidence | denied | failed
"output": {
"content": "Refunds within 30 days, full…",
"confidence": 0.92
},
"confidence": 0.92,
"decisions": [
{ "step": "validation.input", "rule_or_policy": "schema", "result": "pass" },
{ "step": "llm.consensus", "rule_or_policy": ">= 0.85", "result": "pass", "reason": "0.92" },
{ "step": "scoring.composite","rule_or_policy": "weighted", "result": "pass" }
],
"citations": [
{ "source": "policy/refunds.pdf", "section": "§3.2" }
],
"metrics": {
"latency_ms": 1500,
"prompt_tokens": 280,
"completion_tokens": 120,
"cost_usd": 0.0142
},
"obligations": []
}
POST /v1/generate:stream emits server-sent events. Each chunk is a JSON
object: token deltas first, then a final done chunk with
run_id, decisions, citations, and confidence.
curl -N -X POST http://localhost:5057/v1/generate:stream \
-H "X-Api-Key: vorana-dev-key" \
-H "Content-Type: application/json" \
-d '{ "pipeline_id": "pipeline.faq_assistant.v1",
"inputs": { "user_query": "Refund window over $500?" } }'
# Output (server-sent events)
data: {"event":"token","content":"Refunds "}
data: {"event":"token","content":"within "}
data: {"event":"token","content":"30 days…"}
data: {"event":"done","run_id":"r_8f3a2c…","confidence":0.92}
// Browser SSE — Vorana also accepts query-string auth for EventSource
const es = new EventSource(
'/v1/generate:stream?api_key=vorana-dev-key' +
'&pipeline_id=pipeline.faq_assistant.v1' +
'&inputs.user_query=Refund%20window%3F'
);
es.onmessage = (e) => {
const chunk = JSON.parse(e.data);
if (chunk.event === 'token') appendToUi(chunk.content);
if (chunk.event === 'done') { saveRunId(chunk.run_id); es.close(); }
};
import httpx, json, os
with httpx.stream(
"POST", "https://gateway.vorana.ai/v1/generate:stream",
headers={"X-Api-Key": os.environ["VORANA_API_KEY"]},
json={"pipeline_id": "pipeline.faq_assistant.v1",
"inputs": {"user_query": "Refund window?"}},
) as r:
for line in r.iter_lines():
if not line.startswith("data:"): continue
chunk = json.loads(line[5:])
if chunk["event"] == "token":
print(chunk["content"], end="", flush=True)
elif chunk["event"] == "done":
print(f"\nrun_id={chunk['run_id']}")
POST /v1/chat/completions matches OpenAI's wire format. Set the
model to vorana:<pipeline_id> (or send
X-Vorana-Pipeline), point your base_url at the gateway, and
you're done. Streaming is already wired — SSE chunks come back in OpenAI's shape,
plus a final vorana.completion.metadata chunk with the trust-layer fields.
from openai import OpenAI
client = OpenAI(
base_url = "https://gateway.vorana.ai/v1", # <-- only change
api_key = "vorana-dev-key",
)
resp = client.chat.completions.create(
model = "vorana:pipeline.faq_assistant.v1", # <-- pick a pipeline
messages = [{"role": "user", "content": "What's our refund policy?"}],
stream = True,
)
for chunk in resp:
print(chunk.choices[0].delta.content or "", end="", flush=True)
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://gateway.vorana.ai/v1',
apiKey: process.env.VORANA_API_KEY!,
});
const stream = await client.chat.completions.create({
model: 'vorana:pipeline.faq_assistant.v1',
messages: [{ role: 'user', content: "What's our refund policy?" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
curl -N https://gateway.vorana.ai/v1/chat/completions \
-H "Authorization: Bearer vorana-dev-key" \
-H "Content-Type: application/json" \
-d '{
"model": "vorana:pipeline.faq_assistant.v1",
"messages": [{ "role": "user", "content": "Refund policy?" }],
"stream": true
}'
Each pipeline is a DAG of steps with a typed config block, optional when: guards,
retries, and fallbacks. Drop a YAML file into src/Vorana.Gateway/pipelines/
(or load from blob storage in production), and reference it by id from your app.
id: pipeline.faq_assistant.v1
version: 1
steps:
- id: validate
type: validation.composite
with:
rules:
- { field: inputs.user_query, required: true, max_length: 2000 }
- id: retrieve
type: retrieval.hybrid
with:
index: refund_kb
top_k: 5
min_score: 0.4
- id: answer
type: llm.consensus
with:
providers: [azure_openai, anthropic]
threshold: 0.85
on_error: fallback
fallback: answer_strong
- id: answer_strong # only runs if `answer` failed
type: llm.execute
with: { provider: azure_openai, model: gpt-4o }
when: "exists intermediate.answer.error"
- id: score
type: scoring.composite
with:
consensus_weight: 0.4
judge_weight: 0.4
cache_weight: 0.2
A Skill is a versioned, signed package of capability — a step plugin,
a sub-pipeline, a prompt template, or all three. Publish to your org registry once;
every team's pipelines reference it by skill.<org>.<name>.<ver>. Update
it; the next deploy picks up the fix everywhere.
Skills follow semver. Pinned references (v2.1.0) never change without a redeploy.
Floating references (~v2.1) auto-pick the latest patch. The registry tracks a
used-by graph so removing or breaking a skill flags every dependent pipeline.
# skills/redact_customer_pii/skill.yaml — the manifest committed to your repo
id: skill.acme.security.redact_customer_pii
version: 2.1.0
owner: "@acme/security"
description: Mask names, SSN, email, card. EU-GDPR + US-PII profiles.
entrypoint:
type: step.composite # or step.grpc, step.wasm, pipeline.fragment
steps:
- { id: detect, type: validation.composite, with: { profile: pii_us_eu } }
- { id: mask, type: redaction.tokenize, with: { strategy: replace_with_tag } }
inputs: { prompt: string }
outputs: { prompt: string, masked_count: integer }
signing:
required: true # reject unsigned skills at deploy
trust_chain: acme-root
# From your skill repo — package, sign, push to the org registry
$ vorana skill build ./redact_customer_pii
→ bundle: dist/redact_customer_pii-2.1.0.tgz (12 KiB)
$ vorana skill sign ./dist/redact_customer_pii-2.1.0.tgz \
--key acme-root.pem
→ signature OK
$ vorana skill publish ./dist/redact_customer_pii-2.1.0.tgz \
--org acme
→ published skill.acme.security.redact_customer_pii@2.1.0
→ downstream pipelines: 14 (use --check to preview impact)
# Reference a published skill from any pipeline in your org
id: pipeline.support_copilot.v3
steps:
- id: redact
type: skill.acme.security.redact_customer_pii # <-- here
version: 2.1.0 # pinned (recommended for prod)
- id: answer
type: llm.execute
with: { provider: azure_openai, model: gpt-4o }
- id: judge
type: skill.acme.legal.judge_compliance # another shared skill
version: "~1.4" # floating: auto-track patches
# Inspect what's available in your org and where each skill is used
$ vorana skill list --org acme
NAME VERSION OWNER USED BY
redact_customer_pii 2.1.0 @acme/security 14 pipelines
judge_compliance 1.4.2 @acme/legal 8 pipelines
retrieve_policy_kb 3.0.1 @acme/platform 22 pipelines
summarize_call_note 0.9.0 @acme/cx 6 pipelines
$ vorana skill describe redact_customer_pii
→ v2.1.0 signed by @acme/security published 2026-04-02
→ used by pipelines:
customer-chat-prod, support-copilot, claims-intake, advisor-agent,
fraud-triage, broker-assistant, ... (10 more)
Look up a run by run_id, replay it against the current pipeline + policy
bundle, or search across all runs in a tenant. Audit blobs are CMK-encrypted in
production and respect retention policies you set per tenant.
# Pull the full run record — inputs, decisions, citations, output, metrics
curl -H "X-Api-Key: vorana-dev-key" \
http://localhost:5057/v1/runs/r_8f3a2c…
# Response shape mirrors GenerateResponse, plus original `inputs` and timestamps
# Re-execute against today's pipeline + policy. Useful for regression tests,
# A/B'ing prompt changes, or verifying a fix to a low-confidence run.
curl -X POST -H "X-Api-Key: vorana-dev-key" \
http://localhost:5057/v1/replay/r_8f3a2c…
# Optional body: { "pipeline_id": "...override...", "policy_id": "..." }
# Search by tenant, status, score, or text. Indexed via Postgres in production.
curl -H "X-Api-Key: vorana-dev-key" \
"http://localhost:5057/v1/audit/search?tenant=tenant_acme&status=low_conf&from=2026-04-01&q=refund"
# → { "runs": [ { "run_id": "...", "status": "low_conf", "confidence": 0.74, ... } ], "next": "..." }