Self-hosted RAG

PrivateAIforthedocumentsthat

Chat with your contracts, policies, and research. Not a byte leaves your network. Deploy on your hardware. Audit every answer.

Let's talk See how it works

On-premises or your VPCNo data leaves your perimeterEvery answer shows its sources

The trade-off

The trade-off you shouldn't have to make

Cloud AI tools want your documents. Building your own RAG stack takes months. Kernel gives you a third option: an enterprise-grade private platform you control, deployable in a single afternoon.

Capability	ChatGPT Enterprise / Copilot / Glean	Building it in-house	Kernel
Data leaves your network	Yes, vendor cloud	No	No
Time to deploy	Weeks of legal review	Months of engineering	An afternoon
Audit log + RBAC + lifecycle	Vendor's policy	Build it	Built in
Pick which model runs each stage	Single vendor model	DIY	Yes, admin UI
“Why this answer?” trace	No	DIY	Yes
Where your data lives	Vendor's cloud	You decide	You decide

Data leaves your network

ChatGPT / Copilot / Glean: Yes, vendor cloud
In-house build: No
Kernel: No

Time to deploy

ChatGPT / Copilot / Glean: Weeks of legal review
In-house build: Months of engineering
Kernel: An afternoon

Audit log + RBAC + lifecycle

ChatGPT / Copilot / Glean: Vendor's policy
In-house build: Build it
Kernel: Built in

Pick which model runs each stage

ChatGPT / Copilot / Glean: Single vendor model
In-house build: DIY
Kernel: Yes, admin UI

“Why this answer?” trace

ChatGPT / Copilot / Glean: No
In-house build: DIY
Kernel: Yes

Where your data lives

ChatGPT / Copilot / Glean: Vendor's cloud
In-house build: You decide
Kernel: You decide

Built for the buyer who has to say "yes" to compliance

Kernel was built for the team lead who's tired of telling people "we can't use AI for that."

Your data stays in your VPC. Your audit team sees every retrieval. Your users get GPT-class chat over the documents they actually work with.

Your data, your perimeter.

Runs on your own server, your EC2, or a single-tenant VPC we manage. No telemetry. No “we may use your prompts” clause.

Per-stage model routing.

Route every pipeline stage independently. Fast local model for the router, mid-size for expansion, frontier only for the final answer.

Every answer shows its work.

One click reveals the route, the chunks, the model, and whether the grounding check passed. Self-RAG flags low confidence when it can't verify.

What's inside

Production-ready out of the box.

Not a demo.

the retrieval layer

Hybrid retrieval, one router.

Vector search, knowledge graph, and structured SQL fused by an intent router. Each question gets the retrieval path that actually suits it.

RAPTOR summarisation

Hierarchical clustering so broad questions get synthesis and pointed ones get exact passages.

CRAG + Self-RAG

Chunks scored for relevance. Answers verified against context and retried on failure.

Workspaces + governance

draft›pending›approved

RBAC across owner, admin, manager, user, auditor.

Audit + backup

Tamper-evident audit chain. Encrypted backups covering vectors, graph, and metadata.

Cloud sync, your terms

Pull from Dropbox, Drive, OneDrive. Originals stay on your storage; nothing routes through us.

ClamAV scanningPrompt-injection guardField-level encryptionTLS by defaultSSO ready

How it works

How a question flows through Kernel

A user asks one question. Behind the scenes, Kernel runs a pipeline of small, specialised steps. Most run on local models. Only the final answer ever needs a frontier model, and only if you choose.

stage 1 of 7local · qwen 3B

Router

Where should this question go?

The router classifies the intent. Fact lookup goes to vector. Aggregation goes to SQL. Relationship questions go to the knowledge graph. One word out; everything else depends on it.

on your hardware

01 / 07

stage 2 of 7local · qwen 7B

Expansion

Three ways to ask the same thing.

The user asked once. We generate two or three alternative phrasings so retrieval doesn't miss chunks that use different wording. Recall goes up, latency stays flat.

on your hardware

02 / 07

stage 3 of 7local · qwen 7B

HyDE

A plausible answer, unretrieved.

Hypothetical Document Embedding writes what the answer might look like, then embeds that. Retrieval biases toward documents that look like the answer, not just documents that share keywords with the question.

on your hardware

03 / 07

stage 4 of 7hybrid · your infrastructure

Retrieval

Vector, graph, and SQL, fused.

The router's choice runs. Vector search on your embeddings. Cypher traversal on your knowledge graph. SQL on your structured tables. Results merge with source attribution intact.

hybrid retrieval

04 / 07

stage 5 of 7local · qwen 3B

Rerank

Keep the useful chunks, drop the rest.

Retrieval returns candidates. A small local model scores each chunk against the question. Only the chunks that pass make it into the context window. Cheap, precise, refined by CRAG.

on your hardware

05 / 07

stage 6 of 7your choice · local or cloud

Generation

Your choice, local or frontier.

This is the answer the user reads. The one stage where you might route to a frontier cloud model (Claude, GPT, Gemini) for quality, or stay 100% local for zero egress. Configurable per project.

generation

06 / 07

stage 7 of 7local · qwen 3B

Grounding check

Prove the answer is supported.

Self-RAG verifies the generated answer against the retrieved context. If it can't prove grounding, the pipeline retries with a stricter instruction, or flags low confidence. In regulated industries this is the difference between shippable and not.

on your hardware

07 / 07

Router

local

Expansion

local

HyDE

local

Retrieval

hybrid

Rerank

local

Generation

your choice

Grounding check

local

Router

local

Expansion

local

HyDE

local

Retrieval

hybrid

Rerank

local

Generation

your choice

Grounding check

local

localhybrid (vector + graph + sql)your choice (local or cloud)

Each stage is independently configurable. Run the whole thing locally for zero data egress, or route specific stages through cloud models for higher answer quality. Cloud routing is available as an add-on.

Three ways to run Kernel

Choose where it lives. Pricing is shaped to your scale, your model mix, and whether you want us to operate it for you.

Self-hosted

Run Kernel on your own infrastructure. You manage upgrades, you keep the keys. Best for teams that already operate a Linux + Docker stack and want maximum control.

Let's talk

Self-hosted with support

Same deployment, with a support SLA, audit-log export, SSO/SAML, and assistance with upgrades. Best for compliance-driven mid-market firms.

Let's talk

Managed private cloud

We run Kernel for you on dedicated single-tenant infrastructure inside your preferred cloud region. You point a domain at it and your team starts using it. Best for regulated organisations who want the outcome without the operations.

Let's talk

Cloud-model routing (Claude, GPT, Gemini for any pipeline stage) is available as an add-on. Talk to us about your mix.

See it work

A 20-minute walkthrough on your own documents will tell you more than any datasheet. We'll set up a temporary private instance, ingest a sample of your corpus, and let you ask real questions live.

Request a demo

From the engineering team

We write about how Kernel is built. Two posts to start with:

12 min read

How we cut LLM costs 87% by routing only the generation step to Claude

A practical case for per-stage model selection in production RAG. Same answer quality, a fraction of the spend.

Read the post

10 min read

Why we ship Self-RAG retries, and what happened when we didn't

A measurement bug that lived in our streaming pipeline for months, what users actually saw, and how we fixed it without sacrificing latency.

Read the post

Trusted by teams that need to say "yes" to compliance

Ready to talk?

Tell us a bit about your team, your documents, and what compliance constraints you're working under. We'll show you exactly what Kernel would look like for you.

Let's talk

PrivateAIforthedocumentsthatcan't leave your network.

The trade-off you shouldn't have to make

Built for the buyer who has to say "yes" to compliance

Your data, your perimeter.

Per-stage model routing.

Every answer shows its work.

Production-ready out of the box.

Hybrid retrieval, one router.

RAPTOR summarisation

CRAG + Self-RAG

Workspaces + governance

Audit + backup

Cloud sync, your terms

How a question flows through Kernel

Router

Expansion

HyDE

Retrieval

Rerank

Generation

Grounding check

Three ways to run Kernel

Self-hosted

Self-hosted with support

Managed private cloud

See it work

From the engineering team

How we cut LLM costs 87% by routing only the generation step to Claude

Why we ship Self-RAG retries, and what happened when we didn't

Ready to talk?

PrivateAIforthedocumentsthat