ChatGPT Is Not an Option. Here Is What Is.
For regulated, classified, and sovereign environments, public AI APIs are not a security risk to be mitigated. They are architecturally disqualified. The question is what you build instead.
Somewhere in your organisation right now, someone is pasting sensitive text into ChatGPT. They are not doing it maliciously. They are doing it because it works, because it is fast, and because nobody told them clearly enough that the data left the building the moment they pressed enter. In most commercial environments this is a governance problem. In classified, regulated, or sovereign environments it is a disqualifying act — one that, if it involves the wrong data, can void a contract, trigger a mandatory incident report, and end a security clearance.
The answer that most vendors will sell you is a private deployment of the same public models — OpenAI's enterprise tier, Google's Vertex AI, Microsoft's Azure OpenAI Service. These are better. They are not enough. And for a significant class of organisations, they remain architecturally ineligible regardless of their contractual data handling commitments.
This article is about what those organisations can actually use — and why the infrastructure that makes it possible is the same bare-metal air-gapped Kubernetes platform we have already argued for on security and economic grounds.
The disqualification is architectural, not contractual
When a cloud vendor tells you your data is private, they mean it contractually. They do not mean it architecturally. Your inference request still travels over a public network. It still lands on shared compute infrastructure owned and operated by a third party. The model weights that process your data still reside on hardware you do not control, in a facility you cannot audit, under a legal jurisdiction that may not be your own.
For most businesses this is an acceptable risk managed by contract. For the following environments it is not acceptable at all — regardless of what the contract says.
| Provider | Why it is disqualified |
|---|---|
| OpenAI / ChatGPT Enterprise | Data processed on Microsoft Azure infrastructure. No air-gap possible. Contractual data retention controls exist but inference traverses public network and third-party compute. Ineligible under sovereign data residency mandates. |
| Google Gemini / Vertex AI | GCP infrastructure, multi-region by default. No physical data containment. Even single-region deployments rely on Google-operated hardware. Audit trail dependent on Google's own logging — not independently verifiable. |
| Azure OpenAI Service | Closest to compliant for some regulated sectors. Still ineligible for air-gap requirements. Data remains on Microsoft infrastructure. GovCloud variants improve data residency but cannot satisfy physical isolation mandates. |
| Anthropic / AWS Bedrock | API calls traverse public internet to Anthropic or AWS endpoints. No self-hosted deployment option for proprietary models. Weights not available for private deployment. Structurally incompatible with air-gap requirements. |
| Any SaaS AI product | By definition: third-party compute, third-party network, third-party audit. Zero physical control. The business model of SaaS is incompatible with the security model of classified infrastructure. |
The pattern is consistent across every public AI provider: the data handling commitment is contractual, the infrastructure is shared, and the audit trail is dependent on the vendor's own systems. For environments where the security mandate requires physical data containment and independently verifiable audit capability, none of these options qualify. This is not a criticism of the vendors. It is a description of what their architecture is designed to do — and what it is not.
“The question is not whether you trust OpenAI. The question is whether your security mandate permits you to trust any third party with physical access to your inference workload. For classified environments, the answer is no — regardless of who that third party is.”
What sovereign AI actually looks like
The architecture that satisfies these requirements exists today and is operationally mature. It runs on the same bare-metal air-gapped Kubernetes infrastructure described in this series — extended with GPU nodes, an internal model registry, and an agentic application layer written for performance and operational simplicity.
Full stack
Agentic layer
Workflow orchestration written in Go. Stateless agents query internal APIs, reason over retrieved context, and act on internal systems. No external calls. No third-party orchestration framework dependency. Go was chosen for binary simplicity, performance under load, and the absence of a runtime that needs managing in an air-gapped environment.
Inference
vLLM serving fine-tuned open source models via OpenAI-compatible API. Internal clients need no modification. Deployed on KServe with scale-to-zero on CPU nodes, scale-to-active on GPU nodes on demand.
Models
Open weights, fine-tunable, deployable without licence restriction on private infrastructure: Mistral NeMo 12B, Mistral 7B, Llama 3.1 8B, Llama 3.1 70B. Fine-tuned on domain data using QLoRA via PyTorchJob distributed training. Training data never leaves the cluster. Model weights stored in internal S3-compatible object store.
GPU compute
Dedicated bare-metal GPU nodes within the air-gapped cluster. No cloud GPU rental. No shared-tenancy GPU pools. Hardware owned and physically controlled by the organisation. NCCL tuned for multi-node distributed training across the internal fabric.
Model registry
Harbor internal registry mirrors all container images and model artefacts. No pull from Docker Hub, Hugging Face, or any public endpoint during operation. The cluster is fully self-contained from day one of production.
Storage and vector search
Garage S3-compatible object store for model weights, training datasets, and pipeline artefacts. OpenEBS LocalPV for high-throughput node-local storage on GPU nodes.
For structured operational data and vector search, PostgreSQL via the CloudNativePG operator (CNPG) with the pgvector extension serves both roles in a single, operationally familiar system. There is no separate vector database to deploy, mirror, patch, or audit. Document embeddings are stored as vectors directly in PostgreSQL tables alongside their source metadata, queryable with standard SQL and HNSW or IVFFlat indexes for approximate nearest-neighbour retrieval at scale. One operator, one backup strategy, one monitoring surface.
Embeddings are generated entirely on-cluster by a dedicated open source embedding model — nomic-embed-text, bge-m3, or mxbai-embed-large depending on the retrieval task — served via the same vLLM / KServe stack as the generative models. No document ever leaves the perimeter to be embedded. No third-party embedding API is called. The full retrieval-augmented generation pipeline — document ingestion, chunking, embedding, vector storage, semantic retrieval, and generation — executes entirely within the air-gapped boundary.
Network
Fully air-gapped. No external network path exists. Internal DNS, internal certificate authority, internal NTP. An inference request issued inside the perimeter is answered inside the perimeter. A packet inspection at the boundary would find nothing — because nothing crosses it.
The RAG pipeline that never leaves the building
Retrieval-augmented generation is the architecture that makes domain-specific AI genuinely useful — the model reasons over retrieved internal documents rather than relying solely on what it learned during training. In a standard cloud deployment, a RAG pipeline typically involves at least three external services: an embedding API, a hosted vector database, and an inference endpoint. Each is a data exfiltration path. Each is a compliance boundary to negotiate. Each is a vendor dependency that can change its pricing, its API, or its terms with thirty days' notice.
On sovereign air-gapped infrastructure the same pipeline has zero external dependencies. Internal documents are chunked and passed to the on-cluster embedding model. The resulting vectors are written directly into PostgreSQL with pgvector. At query time the agentic Go layer embeds the user's query on-cluster, retrieves semantically similar document chunks via pgvector's HNSW index, constructs a context-enriched prompt, and passes it to the on-cluster fine-tuned LLM for generation. The answer is returned to the user. Not a single byte of this process crossed a network boundary.
The operational advantage of using PostgreSQL as the vector store — rather than a specialist system like Pinecone, Weaviate, or Qdrant — deserves particular emphasis in air-gapped environments. Specialist vector databases require their own container images to be mirrored internally, their own operator or Helm chart lifecycle to be managed, their own backup and restore procedures, and their own monitoring integration. pgvector adds a single extension to a database the organisation is already running, managed by an operator it already operates, backed up by a process it already has. In an environment where every additional system is an audit burden, consolidation is not a preference. It is discipline.
Why Go for the agentic layer
The choice of Go for agentic workflow code deserves a brief explanation because it is not the obvious one. The Python AI ecosystem is vast — LangChain, LlamaIndex, CrewAI, AutoGen — and most agentic frameworks are built around it. In a standard cloud-connected environment, these are reasonable choices.
In an air-gapped production environment, Python's dependency graph becomes a serious operational liability. A framework like LangChain carries hundreds of transitive dependencies, each of which must be mirrored, versioned, and audited internally. A security patch to any one of them requires a controlled update cycle across the entire dependency tree. The operational surface is enormous relative to the value delivered.
Go compiles to a single static binary with no runtime dependency. Deploying an updated agentic workflow means replacing one file. The internal dependency mirror for a Go service is a fraction of the size of its Python equivalent. In an environment where every external dependency is a controlled asset, minimising the dependency count is not a preference — it is an operational discipline.
“A Go binary has no runtime. It has no pip install. It has no transitive dependency that silently upgraded overnight. In an air-gapped cluster, that is not a minor convenience. It is the difference between a manageable update process and an audit nightmare.”
What this unlocks that was previously impossible
The practical consequence of this architecture is that organisations operating in classified or highly regulated environments can now deploy AI capabilities that were previously structurally unavailable to them. A defence systems integrator can run a domain-specific assistant trained on internal technical documentation — without that documentation ever touching a public model API. A regulated financial institution can run AI-assisted analysis on client data that cannot legally leave its jurisdiction. A government agency can deploy agentic workflows that query internal knowledge bases, draft documents, and act on internal systems — with a complete, independently auditable trail of every inference and every action.
These are not hypothetical capabilities. They are running today on bare-metal air-gapped Kubernetes clusters with the stack described above. The open source model ecosystem — Mistral, Llama, and their derivatives — has reached a capability level where fine-tuned domain-specific models are genuinely useful for the document-heavy, reasoning-intensive workloads that define these environments. The infrastructure to run them privately has been mature for longer than most people realise.
The gap has never been technical. It has been awareness. Organisations that assumed AI meant OpenAI, and therefore assumed AI was unavailable to them, are now discovering that the infrastructure they already built for security reasons is the same infrastructure that makes sovereign AI possible.
The air-gapped cluster was always the security boundary. It is now also the intelligence layer. Nothing about that required a public API. Nothing about it ever will.
Catalin Sugau is a senior infrastructure engineer and founder of Sugau, a consultancy specialising in bare-metal Kubernetes, air-gapped infrastructure, and private AI / LLMOps deployments. sugau.com