→ Your finance team modelling next quarter, two weeks before the call.
None of them are pasting that into ChatGPT. So we built the other thing.
feder8d runs the same chat, the same RAG, the same OpenAI-compatible API your
engineers already know — but the prompt never leaves a boundary your CISO signed
off on. Open weights only. Your region. Your encryption keys. A signed audit log
that ends where your data ends.
No card required for trial · Card never stored by us · Source available to your auditor
[ No OpenAI in the prompt path ][ Open-weight models, audited weights ][ Your KMS, your region ][ POPIA · GDPR · SOC 2 in progress ]
[What's wrong with public cloud AI?]
Public cloud AI APIs are excellent — for problems where the prompt isn't a liability.
For anything regulated, confidential, or competitively sensitive, every request creates
five problems your security team will eventually find.
[01]
Your prompts are logged
Retention windows are at the provider's discretion. Subpoenas, breaches, and policy changes are out of your control.
[02]
Your data may train future models
Opt-out flags exist but are silently scoped. One mis-configured API key and your customer corpus is in the next training run.
[03]
Cross-jurisdiction without disclosure
Your prompt to a US endpoint sits in US storage. Good luck explaining that to your DPA, POPIA Information Officer, or a German DPO.
[04]
Audit trail ends at their firewall
You can't prove what happened to a prompt after it left your network. Compliance auditors don't accept 'they said they don't log it.'
[05]
Vendor model risk
Pricing, capability, and policy change with one blog post. Your roadmap becomes their roadmap. Switching costs grow every month.
[Where the bytes go]
Three boxes, two arrows. The orchestrator sits in our control plane. Your namespace
holds everything that's yours. The inference layer is stateless — it never sees a
prompt across a boundary, and there is no closed third-party API behind it unless
you opt in per collection.
Inference is stateless — no prompt or
completion is persisted on the model side. Audit logs store hashes only.
Cross-boundary KV cache reuse is OFF — locked in code. Gated by full
threat-model review before it could ever flip.
Your identity is derived from your pod's environment at boot, never read from the wire. No code path accepts
it as request input.
[Your data, in five separate boundaries]
Belt and braces, then a second belt. Every layer below is something a competitor might be
relying on alone. We make all five mandatory — so a bug in any one of them still leaves
four more between your data and anyone else's.
Dedicated schema. Dedicated DB role. Connection pool scoped to your boundary. Row-level security on top.
[ Layer 03 ]
Vector store
Physically separate Qdrant collection. Every query filters by your boundary ID as defence-in-depth.
[ Layer 04 ]
Cache
Logical Redis DB scoped to you. Cache keys always begin with your boundary ID. No cross-boundary collisions possible.
[ Layer 05 ]
Object store
Dedicated KMS key. Dedicated S3 prefix. IAM scoped to that prefix. Envelope encryption on PI.
[What it looks like for your team]
Three real workloads we built the platform for. None of them work on a public
cloud AI API without your security team finding out and clawing it back.
[ 01 / Legal ]
Outside counsel · M&A
"Summarise every change between draft 7 and draft 9 of this settlement, and flag
anything that affects the indemnity cap."
The draft is privileged. The deal is unannounced. A leak via training data is a
firm-ending event. The associates do this manually today — eight hours per pair
of drafts — because Microsoft Copilot's terms still mention "service improvement."
▸ Boundary: firm tenant on ZA/EU residency
▸ Model: Mistral-Small 24B self-hosted
▸ Retention: 30 days, Special-PI flag
[ 02 / Healthcare ]
Radiology group · 14 sites
"Draft the impression for this chest CT in our house style, referencing the
two prior studies."
The CT contains a date of birth, a hospital MRN, and a tumour. POPIA §26
requires explicit consent before any of it leaves the country. Their existing
speech-to-text vendor was sued in 2024 over exactly this. They want AI; they
cannot have it via OpenAI.
▸ Boundary: hospital VPC, on-prem GPUs
▸ Model: Qwen 2.5 7B + BGE-large
▸ Audit: per-MRN access log, 7-year retention
[ 03 / Financial services ]
Asset manager · pre-announcement
"Compare our Q3 numbers against last year's tone. Highlight anything an analyst
would push back on."
The numbers are MNPI for the next eleven days. The model never sees an external
endpoint. The audit log proves to compliance that nobody asked it anything
between the close and the call.
▸ Boundary: firm tenant, MFA-only access
▸ Model: Mistral-Small 24B + BGE rerank
▸ Compliance: SOC 2 Type II observation report
Composite scenarios, drawn from buyer conversations. Identifying details changed.
If you recognised your team in any of these, we should talk —
hello@feder8d.ai.
[Built different on purpose]
Most AI infrastructure decisions are downstream of "how do we get this shipped fast."
Ours are downstream of "what would the security team find on review." Five places we
do not compromise:
01
No third-party AI provider in the default prompt path. Ever.
Not "we anonymise first." Not "we have a DPA with them." Not "they say they
don't log." If the request leaves your boundary, our roadmap says you opted
into it per collection, with a console banner, and we wrote that audit-log
entry on the way out. Default-off. Compile-time guarded.
02
Open weights, audited licences, named provenance.
Every model in the registry has a published weights hash, a real SPDX licence,
and a known training corpus. No "trust us, it's safe." We document where the
weights came from and your auditor can verify the hash against Hugging Face on
the way in. Closed-weight foundation models stay off the platform.
03
The tenant boundary is physical, not logical.
Schema-per-tenant in Postgres. Dedicated Qdrant collection. Dedicated KMS key.
Namespace-scoped NetworkPolicy. The operator never accepts a tenant_id from the
wire. A bug in our application code can corrupt your data — it cannot leak it
into someone else's.
04
The audit log is the source of truth.
Every model call, every retrieval, every config change writes an entry signed
with the tenant's KMS key. The log lives in your tenant's Postgres — not ours,
not somebody's central observability bus. Your DPO can read it. Your auditor
can subpoena it. We can't redact it for you.
05
If we ever betray the principle, you can fork us.
The whole stack is published source-available under AGPL-3.0. You can read
every commit. You can audit every dependency. If our incentives ever drift from
yours, you can run the version that doesn't drift — on your own infrastructure,
under your own control. That's the contract.
We started this because we kept watching legal, healthcare, and financial teams
stick to spreadsheets and copy-paste because nobody had built them an AI tool
their security team would sign off on. So we did.
— feder8d, Cape Town
[Same `openai` SDK. Different endpoint.]
One line changes. Same wire format, same streaming, same SDKs, same LangChain
integrations. Change the base URL, swap the API key, your prompts stop leaving
your jurisdiction. We add one header
(X-Feder8d-Retrieval) for collection-scoped RAG. Everything else is identical.
1
Point your client at your private base URL.
Your subdomain, your custom domain, or
api.feder8d.ai.
2
Mint an API key in the console. Scoped to
a user, a workspace, or a specific collection.
3
Stream completions over SSE. Citations come
back in the response payload alongside the token deltas.
Every model has published weights, a real SPDX licence, and a verifiable Hugging Face
revision. Your auditor can check the SHA-256 on the way in. Closed-weight foundation
models stay off the platform. Generated from deploy/config/models.yaml — license attribution baked into the registry. Local-dev tier (Ollama) ships with every
install so you can build offline.
Chat
Mistral-Small-24B-Instruct-2501 24B
Default medium-tier. Stronger reasoning than 7B.
Apache-2.0·self-hosted
DeepSeek-R1-Distill-Qwen-32B 32B
Default reasoning tier. Comparable to o1-mini on math + code; fits g6e.xlarge.
Apache-2.0·self-hosted
Qwen2.5-7B-Instruct 7B
Default production small-tier. Strong at Q&A + tool use.
Apache-2.0·self-hosted
DeepSeek-R1-Distill-Qwen-7B 7B
Reasoning + coding at small-pool footprint. Fits g5.xlarge.
Apache-2.0·self-hosted
Qwen2.5-1.5B-Instruct 1.5B
On-prem dev / Tier-0 local laptop default.
Apache-2.0·self-hosted
Llama-3.2-3B-Instruct (Ollama) 3B
Local laptop chat via Ollama for dev demo.
Llama-3.2-Community·self-hosted
Embeddings
BAAI/bge-small-en-v1.5384 dim
Default small embedding (Starter / Pro).
MIT
BAAI/bge-large-en-v1.51024 dim
Default high-quality embedding (Pro+).
MIT
Reranker
BAAI/bge-reranker-v2-m3
Default reranker — multilingual, folded into base price.
Apache-2.0
[ BYO provider — opt in only ]
If you do want to route specific collections to a public cloud AI provider,
plug your own key. Per-collection scope. Logged. Disabled by default.
[Three-axis token meter]
Input, output, and embedding tokens metered separately at model-tiered rates. Drag the
sliders below to see what each plan would actually cost for your traffic. Numbers come
straight from plans.yaml.
100
20
6
small (Qwen 7B)
Resulting workload
InputOutputEmbed— M— M— M
Monthly total per plan
Assumes ~150 input tokens + ~200 output tokens per turn at the chosen tier, plus 800
embedding tokens per session for ingest. BYO-provider tokens do not count.
[Your data stays in your jurisdiction]
Your data lives in af-south-1
(Cape Town) — POPIA-clean. Inference runs in eu-central-1
(Frankfurt) where the GPUs are. The cross-border path uses Linkerd mTLS, ships only
tokens (never raw documents), and is disclosed in every DPA. No bait, no switch.
Your Postgres + vector store + object store never leave af-south-1.
Audit log stays with you. Inference layer logs hashes only.
One-click data export to a bucket you control (GDPR Art. 20).
Phase 2: in-country GPUs when an enterprise customer underwrites it.
[Sleeves rolled up]
For the security team's diligence pack. Every claim above this section maps to a
verifiable artefact below.
$ feder8d-cli license decode
Ed25519 / EdDSA
# header (verified, not trusted)
{ "alg": "EdDSA", "typ": "JWT" }
# claims (verified against the public key in your Helm values)
{
"iss": "feder8d.orchestrator",
"sub": "t_8e7f29a14d3b", # tenant_id
"iat": 1717250400,
"exp": 1720098400, # 33 days = billing period + 3d grace
"tenant_id": "t_8e7f29a14d3b",
"plan_id": "business",
"entitlements": {
"allowed_models": ["small_pool", "medium_pool", "embedding_*", "reranker"],
"max_seats": -1, # -1 = unlimited
"max_custom_domains": -1,
"sso_included": true,
"scim_included": true,
"widget_enabled": true,
"included_tokens_input": 50000000,
"included_tokens_output": 10000000,
"included_tokens_embedding": 200000000,
"included_storage_gb": 100,
"special_pi_allowed": true
},
"degraded": false,
"trial": false,
"lifecycle_state": "active"
}
$ kubectl get networkpolicy -n tenant-acme -o yaml
applied at namespace create
# Default-deny: nothing leaves this namespace# unless an explicit rule allows it.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-egress-default
namespace: tenant-acme
spec:
podSelector: {}
policyTypes: [Egress]
# empty egress = deny all
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-required
namespace: tenant-acme
spec:
podSelector: {}
policyTypes: [Egress]
egress:
# Model Gateway only — no other AI providers
- to:
- namespaceSelector: { matchLabels: { name: feder8d-model-plane } }
ports: [{ protocol: TCP, port: 8080 }]
# Shared Postgres (your schema only — RLS enforces)
- to:
- namespaceSelector: { matchLabels: { name: feder8d-data } }
ports: [{ protocol: TCP, port: 5432 }]
# DNS + cluster Service mesh control plane
- ports:
- { protocol: UDP, port: 53 }
$ psql feder8d -c '\dn'
one schema per tenant
Name | Owner
-------------------+-----------
public | feder8d
tenant_acme | tenant_acme# dedicated role
tenant_globex | tenant_globex
tenant_sudosky | tenant_sudosky# Each tenant's role can ONLY see their own schema.# RLS is enforced as belt-and-braces, but the schema# boundary alone makes cross-tenant reads impossible.
CREATE ROLE tenant_acme NOINHERIT LOGIN PASSWORD '...';
GRANT USAGE ON SCHEMA tenant_acme TO tenant_acme;
REVOKE ALL ON SCHEMA public FROM tenant_acme;
ALTER ROLE tenant_acme SET search_path = tenant_acme;
$ cat values-on-prem.yaml
target — Dedicated On-Prem
# Dedicated On-Prem deployments are sales-led for v1 —# these values are tuned together with your ops team.
mode: on_prem
billing:
backend: license_only# offline-signed license
license_path: /etc/feder8d/license.jwt
identity:
provider: internal# email + password, argon2id; no SSO
secrets:
backend: vault# bundled HashiCorp Vault
object_store:
backend: minio# bundled, S3-compatible
model_plane:
provider: self_hosted
pool:
small: { model: qwen2.5-7b-instruct, vram_gb: 16 }
medium: { model: mistral-small-24b, vram_gb: 48 }
backups:
target: s3://customer-controlled-bucket
velero_schedule: "0 */4 * * *"
telemetry:
egress_allowed: false# air-gapped mode
[60-second proof]
Three terminal commands against your tenant. Token from the console; everything else is
canonical OpenAI wire format.
zsh — acme.feder8d.ai
# 1. Ingest a document into a collection (parses → chunks → embeds → upserts Qdrant)
$ curl https://acme.feder8d.ai/ingest/upload/col_kb \
-H "Authorization: Bearer fed_…" \
-F file=@./onboarding-handbook.pdf
{"document_id":"doc_b2a9","chunks":47,"embedded":47,"version":3}
# 2. Ask a question scoped to that collection — streamed answer with inline citations
$ curl https://acme.feder8d.ai/v1/chat/completions \
-H "Authorization: Bearer fed_…" \
-H 'X-Feder8d-Retrieval: {"collection_ids":["col_kb"],"top_k":6}' \
-d '{"model":"qwen2.5-7b-instruct","messages":[{"role":"user","content":"What is the parental leave policy?"}],"stream":true}'
data: {"choices":[{"delta":{"content":"16 weeks fully paid"}}],"x-feder8d-citations":[{"chunk_id":"c_d4f3","score":0.91}]}
data: [DONE]
# 3. Export everything this tenant has — GDPR Art. 20 / POPIA §72
$ curl -X POST https://acme.feder8d.ai/me/export \
-H "Authorization: Bearer fed_…"
{"job_id":"dsar_…","status":"queued","archive_uri":"s3://acme-export/…tar.gz"}
[ Format ]
Drop-in OpenAI /v1 spec. openai-python, openai-node, LangChain, LlamaIndex — all work unchanged.
Every call writes an entry to your tenant's audit_log with hashes of prompt + completion. Raw text stays in your tenant DB.
[Plans that don't pretend]
Five plans + one POC. Every number on this card is live from plans.yaml. A CI drift check fails the build if the marketing site ever disagrees with the
runtime.
Posture below is generated from compliance.yaml. Launched 2026-06-01.
Full inventory →
DSAR endpoints (export + erase) wired into the API.
72-hour breach notification pipeline wired to your DPO contact.
2 critical sub-processors — full list
public at /legal/sub-processors.
No third-party AI provider appears on that list.
One-click data export to a bucket you control.
GDPR
compliant
POPIA
compliant
SOC 2 Type II
observation in progress
ISO 27001
stage 1 evaluation
Encryption at rest
enabled
Encryption in transit
enabled
[Auditable by your security team]
The source is published so your security team, your auditor, or your DPO can read
exactly how we handle your data — every line, every commit. There’s nothing
to take our word for. Source-available is a diligence credential, not a free tier:
running it in production needs a commercial agreement with us.
Start with a 14-day trial on our managed boundary in your jurisdiction. If your
compliance team needs an on-prem install instead, we run that conversation under a
Dedicated contract. Either way, public cloud AI never sees a single token.