The Trojan Horse in Your Air-Gap: Why Model Provenance Is the Sovereign AI Question Nobody Is Asking

By Catalin Lichi · Sugau Pty Ltd

The release of Kimi K2.6 this week generated exactly the reaction Moonshot AI wanted. Benchmark numbers. GitHub stars. Developer excitement. A 1-trillion-parameter open-weight model that matches GPT-5.4 and Claude Opus 4.6 on agentic coding benchmarks — free, open-source, runs on your hardware. What’s not to like.

Here is what nobody is asking: who built the weights, and what is inside them.

The Inspection Problem Is Not Theoretical

When you deploy K2.6 on your infrastructure, you are running opaque computation developed by a Beijing-based laboratory against your data, on your network, with your credentials in scope. The fact that the weights are published on Hugging Face does not make them auditable. One trillion parameters in INT4 quantization is not inspectable by any organisation operating below nation-state intelligence budget. The MoE routing architecture — 384 experts, 8 active per token — means that specific input patterns could theoretically activate specific expert combinations producing adversarial outputs that would never surface in standard benchmark evaluation.

This is not a crude backdoor. It does not need to be. Steganographic behaviours triggered by specific prompts. Subtle output manipulation on high-value query types. Data patterns embedded in generated code that exfiltrate context through seemingly legitimate outputs. None of these require the model to “know” it is doing anything wrong. None are detectable without exhaustive adversarial red-teaming at a scale no single organisation can fund.

You would not run an unsigned binary from a Beijing-adjacent developer with root access on your production cluster. Running K2.6 on sovereign infrastructure is functionally identical. The computation is opaque. The provenance is unverifiable. The attack surface is enormous.

The Architecture Argument Is Separate — And Also Points the Wrong Direction

Even setting provenance aside, the “one monster model does everything” architecture is wrong for sovereign deployments on its own merits.

A 1T parameter general-purpose model is not more capable than a purpose-built stack for your specific domain. It is more expensive, slower, harder to audit, and impossible to explain to a compliance officer or a court. In defence and regulated environments, explainability is not a nice-to-have. It is a legal and operational requirement.

The correct sovereign AI architecture is not a frontier model doing everything. It is specialised models fine-tuned on narrow domain tasks — document classification, threat summarisation, code review, anomaly detection — coordinated by a mid-size orchestrator that handles routing and planning. Each model is small enough to inspect, fast enough to be operationally useful, and scoped narrowly enough that its failure modes are predictable and bounded.

The orchestrator does not need to be a trillion-parameter beast. It needs to be reliable, deterministic enough to audit, and fast enough not to be the bottleneck. A well-configured 32B model with a disciplined system prompt does this job correctly. The 1T parameter model is architectural overkill that creates problems it does not solve.

Model Provenance Is the Question Your Vendor Is Not Answering

The sovereign AI conversation in Australia, the UK, and across Five Eyes is accelerating. CISA guidance is tightening. ASD is watching. The question every serious procurement conversation will eventually reach is not “can this model do the task” — it is “can we verify what this model will do, and who is ultimately responsible for its behaviour.”

For that question, model provenance is not one consideration among many. It is the threshold requirement. A model with unverifiable training provenance from a jurisdiction with state-directed technology policy does not belong inside an air-gapped sovereign deployment regardless of its benchmark scores. The benchmark scores are precisely the point — capability without provenance is the threat model, not the solution.

Western-origin open models with documented training pipelines, published data provenance, and existing defence community vetting exist. Llama 3.x carries Meta provenance and has been through more adversarial scrutiny than any other open-weight family. Mistral operates under EU jurisdiction with French government visibility into its research. These are not as exciting as a trillion-parameter Chinese model that tops agentic benchmarks. They are, however, deployable in environments where the answer to “who built this and what is in it” has to be more than “we don’t know, but the numbers look good.”

The Air-Gap Does Not Protect You from the Model

This is the point that gets missed. Organisations treat air-gapping as the security answer. Run it on our hardware, no internet, problem solved. The air-gap protects you from network-layer exfiltration. It does not protect you from computation that manipulates outputs, embeds patterns in generated artefacts, or behaves differently on inputs that match specific signatures. The threat is inside the weights. The air-gap is irrelevant to that threat.

Sovereign AI means knowing what is running on your data. That requires model provenance you can defend, architecture you can explain, and a vendor relationship with legal accountability in a jurisdiction your security team recognises.

A trillion parameters and a MIT licence is not sovereignty. It is the appearance of control without the substance of it.

Catalin Lichi is the founder of Sugau — a bare-metal Kubernetes consultancy specialising in sovereign infrastructure and private AI for defence and regulated industries. Based in Romania, operating across Australia and Europe. sugau.com