Back to Blog

We Left the Cloud. Here Are the Actual Numbers.

Cloud repatriation is no longer a fringe idea. It is a financial decision — and for the organisations that have made it, the spreadsheet rarely lies.

By Catalin Sugau · Sugau Infrastructure

When a CTO asks me about leaving the cloud, the conversation usually starts with anxiety and ends with arithmetic. The anxiety is understandable. A decade of industry consensus has told them that the cloud is inevitable, that on-premise is legacy, that anyone running their own hardware is one hard drive failure away from catastrophe.

The arithmetic tells a different story.

Over the past several years I have worked with organisations across defence, gaming, and enterprise software that made the decision to repatriate workloads from major cloud providers onto bare-metal infrastructure. What follows are two anonymised case studies from that work — the real numbers, the real friction, and what those organisations found on the other side.

What the cloud bill actually contains

Before the numbers, a brief word on cloud billing opacity. Most CTOs can tell you their monthly cloud spend to within a few thousand euros. Very few can tell you precisely what they are paying for. Data egress fees buried in networking costs. Idle reserved instances that were purchased during a capacity spike two years ago and never released. Logging storage that grew silently for eighteen months. A managed Kubernetes control plane fee charged per cluster, per hour, invisibly.

This is not accidental. Cloud pricing is engineered to be difficult to optimise. Simplifying it would reduce revenue. The FinOps industry — an entire professional discipline dedicated to understanding your cloud bill — exists precisely because the vendors have made their own pricing illegible. You are paying a consultant to decode an invoice. That cost never appears in the headline number.

“The cloud bill is not a utility invoice. It is a negotiated treaty written by one side of the table, in a language they invented, subject to change with thirty days' notice.”

Case study A — defence systems integrator, air-gapped production cluster

This case is structurally different from Case B, and it would be dishonest to frame it the same way. There was no cloud bill to compare against. There was no migration from a public provider. The question was never whether to run on bare-metal air-gapped infrastructure — the security mandate made that decision. For workloads of this classification, public cloud is categorically ineligible. The infrastructure decision was never a commercial one.

The real financial question here is not what the infrastructure cost. It is what a breach would have cost.

The organisation handled classified data on behalf of sovereign defence clients. We will say nothing further about the nature of that data — nor should we need to. The classification level alone defines the consequences of a breach: immediate contract termination, mandatory notification of sovereign clients, and near-certain revocation of the security clearances on which the entire business depends.

Breach consequence	Estimated impact
Contract value at risk (active + pipeline)	€200M+
Security clearance revocation	Total — one incident = ineligible for future tenders
Sovereign client trust	Permanent loss — not recoverable

This is not a company that would pay a regulatory fine and issue a press release. A serious incident would not cost it money. It would end it. The entire pipeline of future work — built on years of trust, clearance, and domain competence — dissolves with a single verified breach. The business case for correct infrastructure is not return on investment. It is organisational survival.

Why cloud was never considered: Sovereign data residency requirements mandated that classified material never traverse a network segment outside physical organisational control. Public cloud providers — including GovCloud variants — cannot satisfy this requirement by architecture. No encryption standard and no compliance certification is a substitute for an air gap. The mandate came from the client. It was not negotiable.

What was required — and delivered — was an operationally mature air-gapped Kubernetes cluster: fully disconnected from public networks, with all dependencies mirrored internally, and a security posture auditable end-to-end by the client's own compliance team without reliance on any third-party attestation.

The value was not cost savings. The value was that the client could finally answer the question their sovereign customers ask at every audit: “Show me, completely, where our data is and who can reach it.” On a well-built air-gapped bare-metal cluster, that question has a complete and demonstrable answer. On any public cloud variant, it does not.

Breach consequence	Public cloud	Air-gapped bare-metal
Data exfiltration vector	Network-accessible by design	Physically impossible — no external network path
Shared-tenancy risk	Hypervisor-level exposure exists	Dedicated hardware, no co-tenants
Audit completeness	Dependent on provider attestation	Full in-house audit, no third-party dependency
Regulatory eligibility	Ineligible by sovereign mandate	Fully compliant
Business continuity after incident	Near zero	Incident scope containable and fully traceable

The infrastructure is now a platform, not just a perimeter

What has changed recently — and materially — is what this infrastructure can now do beyond protecting data. The same air-gapped Kubernetes cluster that keeps classified workloads unreachable from the outside world can today run fine-tuned open source large language models on dedicated GPU nodes, entirely within the perimeter. Inference requests never leave the facility. Training data never touches a public API. Agentic AI workflows, built in Go for performance and operational simplicity, can operate across internal systems — querying, reasoning, and acting — without a single packet crossing an external boundary.

For defence and sovereign clients, this is not a convenience. It is the only architecture under which AI capabilities can be deployed at all. The air-gapped cluster was always the security boundary. It is now also the intelligence layer.

“For defence-grade workloads, the infrastructure question is not 'what does it cost to run correctly?' It is 'what does it cost to get it wrong?' That number has nine figures. The infrastructure bill does not.”

This is the conversation cloud vendors are structurally incapable of having with you honestly. They can sell you a compliance badge. They cannot sell you an air gap. And in the sectors where an air gap is the requirement, no badge is a substitute.

Case study B — gaming platform, AWS to bare-metal migration

Metric	Figure
Monthly AWS spend before repatriation	€28,000
Monthly infrastructure cost after repatriation (year 1)	€5,200
Monthly saving	81%
Hardware ROI	Achieved in under 8 months

A mid-size gaming platform had grown its AWS footprint organically over several years — the classic pattern of fast-moving product teams provisioning services without centralised governance. By the time repatriation became a serious conversation, they had active workloads spread across three AWS regions, a managed RDS cluster they were significantly over-provisioned on, and data egress costs that had become a material budget line due to player-facing telemetry volume.

The migration took four months including a parallel-run period. Hardware acquisition was completed in the first month. By month eight, the capital expenditure on hardware had been entirely recovered from the monthly savings differential. From that point forward, the organisation was operationally profitable on infrastructure it owned outright.

The engineering team reported that system behaviour became significantly more predictable once they controlled the full stack — a latency anomaly that had plagued them for over a year disappeared within weeks of migration and was traced to a shared-tenancy network issue on the cloud provider's side.

The numbers the cloud vendors don't show you

The standard objection to repatriation is total cost of ownership: hardware, colocation, power, staffing, maintenance. It is a legitimate consideration presented almost exclusively in bad faith. Cloud vendors commission TCO studies that assume on-premise infrastructure is run by the same headcount as a greenfield cloud deployment, with no economies of scale, and priced at retail hardware rates.

A more honest comparison:

Cost dimension	Cloud	Bare-metal (year 2+)
Compute	Variable, elastic, premium-priced	Fixed, predictable, owned
Storage	Per-GB/month + retrieval fees	Amortised hardware, no retrieval cost
Egress / networking	Charged per GB — often a surprise line	Flat colo bandwidth or near-zero
Compliance tooling	Paid add-ons (CSPM, GuardDuty, etc.)	Open source tooling, owned audit trail
Billing complexity	Requires dedicated FinOps function	Single invoice, full transparency
Vendor lock-in risk	High — proprietary APIs, pricing changes	None — open standards throughout

What this decision actually requires

Repatriation is not right for every organisation. If you have genuinely unpredictable, spiky workloads with no historical pattern — true burst compute needs measured in hours per year — managed cloud capacity may remain economical for that specific use case.

But for the majority of enterprise workloads — steady-state applications, databases, internal tooling, AI inference, data pipelines — the variability argument is not the real reason organisations stay. The real reason is inertia, accumulated tooling debt, and the understandable fear of operational responsibility.

Bare-metal Kubernetes on owned hardware is not the operational nightmare it was in 2015. The tooling has matured. Bootstrapping a production-grade cluster is a days-long exercise, not a months-long project. The operational surface, precisely because it is smaller and more legible, is easier to maintain than a complex cloud-native architecture with twelve managed services in the dependency graph.

“The organisations that leave the cloud do not regret the technical complexity of the move. They regret how long they waited to make it.”

Pull your last twelve months of cloud invoices. Identify the ten largest cost lines. For each one, ask whether you are paying for elasticity you actually use, or for the privilege of not having to think about the problem. In most cases, the answer will tell you everything you need to know.

Catalin Sugau is a senior infrastructure engineer and founder of Sugau, a consultancy specialising in bare-metal Kubernetes deployment, cloud repatriation, and private AI infrastructure. sugau.com