enterprise AI gateway

Every AI call under governance. Budgets that never blow up.

Govern people and agents in a single layer: ACID financial pre-authorization, multi-LLM routing and tamper-proof audit — between your applications and any provider (OpenAI, Anthropic, Google or self-hosted).

Book a demo See the features › fail-closed by default

POST /v1/chat/completions

# gw_… key — models, rate limit and IP already on the token
curl https://api.ai-gateway-core.com/v1/chat/completions \
  -H "Authorization: Bearer gw_live_••••" \
  -d '{
    "model": "anthropic/claude-opus",
    "messages": [{ "role":"user", "content":"summarize the contract" }]
  }'
# 1 reserve → 2 provider → 3 settle

reserved: $0.042 · settled: $0.028 audit #48211 · hash ✓

Multi-LLM

providers + self-hosted

ACID

reserve → settle on every call

SHA-256

chained, verifiable audit

keys exposed to the caller

// running AI without governance is expensive

What breaks when AI hits production

Unpredictable costs

Every team spins up its own provider key, with no limit and no visibility. The bill only shows up at the end of the month.

→

Reserve → settle cycle on every call. With no budget left in the area, the call does not happen (HTTP 402).

Ungoverned agents

Every automation — n8n flow, agent, script — gets its own key and spends without limits. Nobody knows which agent did what, or how to cut one off without taking down the rest.

→

Each agent is a headless identity with an owner, environment, models, IPs and its own token. Budget and rate limit per agent; suspend or rotate in isolation.

No traceability

With no reliable record of what was spent and why, audit and compliance turn into fragile manual work.

→

Append-only trail chained by SHA-256, verifiable, with PII redacted before any persistence.

// what the gateway does on every call

What happens between your app and the provider — on every call

Routing & governance

multi-LLM · encrypted keys

Multi-LLM routing

Managed providers (OpenAI, Anthropic, Google) and self-hosted models on your own infrastructure — vLLM, Ollama or internal LLMs — behind a single compatible API. Switch providers without changing code; credentials stay encrypted server-side (AES-256) and are never revealed to the caller.

✓ Multi-type providers: managed and self-hosted in one catalog
✓ Sensitive data stays on your network with internal models
✓ Per-token allowed models, with IP allowlist

consumed by

Internal chat

Automations

Agents · n8n

API gw_…

›

/v1

›

routed to

OpenAI

Anthropic

Google

Self-hosted

governance on every call · keys encrypted server-side

active agents · Finance area

owner: maria@

n8n-fechamento-bot prod active

classificador-nf prod active

extrator-relatorios staging rate-limit

crawler-precos dev suspended

headless identity · own gw_ token · auditable by owner

M2M identity · headless

Agent management

Headless automations — n8n flows, agents and scripts — are first-class identities, separated from people. Each agent has a responsible owner, purpose, environment, allowed models, IPs and its own rotating gw_… token — with independent budget and rate limit.

✓ M2M never logs in: token-only auth, fail-closed
✓ Audit answers "which agents does this human own?"
✓ Suspend or rotate one agent’s token without touching the rest

1 · reserve (worst-case) $0.042

2 · provider · response 200 OK

3 · settle (real cost) $0.028

reservation freed · budget returned audit ✓

pre-authorization · ACID

ACID financial governance

Not a thin proxy. Before calling the AI, the gateway reserves the worst-case cost against the area’s budget; then debits the real cost and frees the reservation. Fully transactional, with row locks — with no budget, the call never leaves (HTTP 402).

✓ Atomic reserve, settle and rollback
✓ Provider failure → automatic rollback (502)

Cost & resilience

forecast · A/B efficiency

Cost control & FinOps

Per-call cost ceiling, per-area budget, end-of-month forecast and automatic switching to a cheaper model of equivalent quality. Semantic cache (embeddings-based) eliminates repeated calls.

✓ Daily spend forecast and per-area baselines
✓ Per-token semantic cache (toggle)

spend this month · Engineering area

$4,280 / 6,000

forecast: $5.9k

14:02 · tok_gw_a1 · claude-opusz 0.4

14:02 · tok_gw_7c · gpt-4oz 0.9

14:03 · tok_gw_e2 · burst x18z 4.6 ⚠

14:03 · tok_gw_e2 · suspendedauto-block

PII redacted before persisting · async

z-score · async logs

Observability & anomalies

Every spend is traceable by area, account and token. Async logs (90d retention) with PII redacted (national IDs, email, phone, card) before persisting. Z-score detection suspends anomalous tokens — without blocking the response.

✓ Real-time financial and usage dashboard
✓ Automatic suspension on anomalous behavior

fail-closed · rollback

Automatic fallback & resilience

A provider error never becomes a wrong charge or a corrupted response: the gateway rolls back the reservation and returns 502 predictably. Rate limit (sliding window) and IP allowlist protect every token, always fail-closed.

✓ Reservation rollback on upstream failure
✓ Per-token rate limit (req/min and req/hour)

provider A ✕ 503 → rollback

retry / 502 ✓ budget returned

no charge · consistent state

Chat productivity

library · 18 prompts

+ New prompt

Executive summary

General · admin

Public Use →

Email {{tone}}

Marketing · you

Private Use →

Meeting minutes

General · admin

Public Use →

Support reply {{case}}

CX · john

Pending Use →

{{…}} variables pre-fill the chat · scope by approval

library · {{ }} variables

Prompt library

Register reusable prompts with {{…}} variables and use them in the chat pre-filled with one click. Anyone can create; share as private (just you) or public for the whole company — with admin approval. Search by area and author.

✓ Variables become a mini-form when used in the chat
✓ Private or public sharing, with admin moderation
✓ Ready-made starter catalog, extensible by any user

chat → structure → render

Artifacts from the chat

The conversation doesn’t stop at text. With one click, the user turns what they discussed into a presentation (PPTX), document (PDF) or infographic — the model generates the structure and a server-side renderer builds the file, ready to download.

✓ Enabled only for capable models (admin gating)
✓ Infographic carrying your brand’s visual identity
✓ On-demand generation — nothing persisted without need

InfographicPresentation .pptxDocument .pdf

Example of an infographic generated from a chat conversation — Clay — Clay

Example of an infographic generated from a chat conversation — Bricks — Bricks

Example of an infographic generated from a chat conversation — Sketch — Sketch

Example of an infographic generated from a chat conversation — Instructional — Instructional

4 infographic styles · real product samples

// drop-in, OpenAI-compatible

Change the base URL. Get governance.

The API is compatible with the OpenAI format. Point your SDK at the gateway, use a gw_… token instead of the provider key, and every call becomes authorized, metered and audited — without touching your code.

Python Node cURL n8n

- base_url = "https://api.openai.com/v1"

+ base_url = "https://api.ai-gateway-core.com/v1"

api_key = "gw_live_••••" # gateway token

one line changes · governance on every call

// security and compliance

Designed for audit, not retrofitted later

Encrypted credentials

Provider keys encrypted with AES-256. Master key in an environment variable, never in the database. Revealable only by the admin, once, under confirmation.

Tamper-proof audit

The audit trail is append-only, chained by SHA-256. Deletion is blocked at the database. The chain is verifiable at any time.

PII redacted both ways

Personal data (national IDs, email, phone, card) is redacted before persisting and before sending to the provider — closing off inbound and outbound leaks.

Fail-closed by default

Any doubt about quota, authentication or state results in a refusal. Rate limit, IP allowlist and anomaly suspension protect every token.

Enterprise SSO

Login via Entra ID / Microsoft 365, Okta or Google Workspace, with account provisioning on first access and role-based profiles.

Deploy in your environment

Docker Compose stack as the production artifact: on-premise or in your cloud. Datastores on an isolated network, daily backup with configurable retention.

// frequently asked questions

Frequently asked questions

Am I locked into a vendor?

No. The API is OpenAI-compatible and routes to OpenAI, Anthropic, Google or self-hosted models (vLLM, Ollama). Switching providers is configuration, not a code rewrite.

Does the gateway add latency?

Overhead is minimal: observability runs in the background (it never blocks the response) and the semantic cache answers instantly on a hit. Reserve and settle are millisecond database operations.

Can I run it on-premise or in my own cloud?

Yes. The production artifact is a Docker Compose stack — on-premise or in your cloud, with datastores on an isolated network, daily backup and assisted zero-downtime migration.

What about data protection and sensitive data?

PII (national IDs, email, phone, card) is redacted before persisting and before sending to the provider. With self-hosted models, data never leaves your network. Every operation lands in a verifiable audit trail.

How much effort to integrate?

Change your SDK base URL and use a gw_ token instead of the provider key. Works with Python, Node, cURL and automations like n8n — without changing your application.

Does it work for automations and agents?

Yes. Each automation (n8n flow, agent, script) is a headless identity with its own owner, budget, models and token — governed and auditable individually.

Put your company’s AI under governance.

Book a demo and watch reservation, settlement and audit happening on a real call — in your environment.

[email protected]