Skip to main content
Kernel
Self-hosted RAG · v1.0

PrivateAIforthedocumentsthat

Kernel is a self-hosted RAG platform that lets your team chat with your contracts, policies, research, and reports — without sending a single byte to OpenAI, Anthropic, or Google.

Deploy on your hardware. Govern it like the rest of your stack. Audit every answer.

On-premises or your VPCNo data leaves your perimeterEvery answer shows its sources

The trade-off

The trade-off you shouldn't have to make

Cloud AI tools want your documents. Building your own RAG stack takes months. Kernel gives you a third option: an enterprise-grade private platform you control, deployable in a single afternoon.

Data leaves your network

ChatGPT / Copilot / Glean
Yes — vendor cloud
In-house build
No
Kernel
No

Time to deploy

ChatGPT / Copilot / Glean
Weeks of legal review
In-house build
Months of engineering
Kernel
An afternoon

Audit log + RBAC + lifecycle

ChatGPT / Copilot / Glean
Vendor's policy
In-house build
Build it
Kernel
Built in

Pick which model runs each stage

ChatGPT / Copilot / Glean
Single vendor model
In-house build
DIY
Kernel
Yes — admin UI

“Why this answer?” trace

ChatGPT / Copilot / Glean
No
In-house build
DIY
Kernel
Yes

Where your data lives

ChatGPT / Copilot / Glean
Vendor's cloud
In-house build
You decide
Kernel
You decide

Built for compliance

Built for the buyer who has to say "yes" to compliance

Kernel was built for the team lead who's tired of telling people "we can't use AI for that."

Your data stays in your VPC. Your audit team sees every retrieval. Your users get GPT-class chat over the documents they actually work with.

Your data, your perimeter.

Run Kernel on your own server, your own EC2, or a single-tenant VPC we manage for you. No telemetry. No “we may use your prompts to improve our models” clause. The only egress to a third-party model is the one you explicitly configure — and only for the pipeline stages you choose.

Per-stage model routing.

Most RAG tools force one model to do everything. Kernel doesn't. Pick a fast local model for query routing, a mid-size local model for expansion, and route only the final answer through a frontier cloud model — or stay 100% local. Each pipeline stage (router, expansion, HyDE, relevance, RAPTOR, entity extraction, generation) gets its own primary and fallback, configurable from an admin UI.

Every answer shows its work.

Click “Why this answer?” on any response and see the route taken, the chunks retrieved, the model that wrote it, and whether the grounding check passed. If our Self-RAG verifier can't confirm the answer is supported by the sources, the UI flags it as low confidence. In regulated industries, “I can't explain where that came from” isn't an acceptable answer — Kernel never makes you say it.

What's inside

Production-ready out of the box.

Not a demo.

Hybrid retrieval

Vector search, knowledge graph, and structured SQL fused by an intent router. Each question gets the retrieval path that actually suits it.

RAPTOR summarisation

Hierarchical document clustering so broad questions get high-level synthesis and pointed questions get exact passages.

CRAG filtering + Self-RAG grounding

Retrieved chunks are scored for relevance; the final answer is verified against context and retried on failure.

Workspaces + governance

File lifecycle (draft → pending review → approved), classification labels, RBAC across owner / admin / manager / user / auditor.

Audit + backup

Tamper-evident audit chain. Scheduled encrypted backups covering the vector store, knowledge graph, and metadata.

Cloud sync (your terms)

Pull documents from Dropbox, Google Drive, or OneDrive. The originals stay on your storage; nothing routes through us.

ClamAV scanningPrompt-injection guardField-level encryptionTLS by defaultSSO ready

How it works

How a question flows through Kernel

A user asks one question. Behind the scenes, Kernel runs a pipeline of small, specialised steps — most of them on local models. Only the final answer ever needs a frontier model, and only if you choose.

Router

local

Expansion

local

HyDE

local

Retrieval

hybrid

Rerank

local

Generation

your choice

Grounding check

local

localhybrid (vector + graph + sql)your choice (local or cloud)

Each box is independently configurable. Run the whole thing locally for zero data egress, or route specific stages through cloud models for higher answer quality. Cloud routing is available as an add-on.

Three ways to run it

Three ways to run Kernel

Choose where it lives. Pricing is shaped to your scale, your model mix, and whether you want us to operate it for you.

Self-hosted

Run Kernel on your own infrastructure. You manage upgrades, you keep the keys. Best for teams that already operate a Linux + Docker stack and want maximum control.

Let's talk
Most popular

Self-hosted with support

Same deployment, with a support SLA, audit-log export, SSO/SAML, and assistance with upgrades. Best for compliance-driven mid-market firms.

Let's talk

Managed private cloud

We run Kernel for you on dedicated single-tenant infrastructure inside your preferred cloud region. You point a domain at it and your team starts using it. Best for regulated organisations who want the outcome without the operations.

Let's talk

Cloud-model routing (Claude, GPT, Gemini for any pipeline stage) is available as an add-on. Talk to us about your mix.

See it work

See it work

A 20-minute walkthrough on your own documents will tell you more than any datasheet. We'll set up a temporary private instance, ingest a sample of your corpus, and let you ask real questions live.

Request a demo

Trusted by teams that need to say "yes" to compliance

BlueSkyCDCLiveStyleSCS

Ready to talk?

Tell us a bit about your team, your documents, and what compliance constraints you're working under. We'll show you exactly what Kernel would look like for you.

Let's talk