Systems Foundations

Virtualization, Bare Metal, and VM Orchestration as Product Infrastructure

How QEMU/KVM, libvirt, OVS, KubeVirt, Firecracker-style microVMs, Linux, bare metal lifecycle, VM orchestration, metering, and billing become the compute substrate beneath larger platform ecosystems.

Back to Resources All Resources

KVM, QEMU, libvirt, and Open vSwitch virtualization architecture diagram

Context

Bare metal and virtual machines look like infrastructure primitives, but the companies that learned to operate them as products became some of the largest ecosystems in technology. AWS EC2, Google Compute Engine, and Azure Virtual Machines are not just compute services. They are control planes for capacity, lifecycle, identity, networking, storage, metering, billing, support, and the higher-level products that sit on top.

Kubernetes did not replace that substrate. It usually rides on it. Managed Kubernetes, databases, AI platforms, build systems, notebooks, observability products, and enterprise applications depend on the provider first solving machine lifecycle, image lifecycle, tenant boundaries, network attachment, storage attachment, quotas, usage accounting, failure recovery, and billing. If you build on bare metal or VMs, you inherit those responsibilities instead of getting them for free.

Decision Guide

Frame the decision before choosing the architecture.

Decision

Should the product run on VMs, bare metal, Kubernetes, microVMs, or a mix, and what does that mean for customers?

Who It Helps

Platform teams building compute products, private clouds, AI platforms, or customer-specific infrastructure systems.

Proof to Look For

Lifecycle automation, isolation boundaries, metering, billing hooks, workload placement, failure recovery, and operator access paths.

Linux Is the Common Substrate

Linux powers most of this stack because it owns the primitives the platform composes: KVM for hardware virtualization, cgroups and namespaces for isolation, bridges and OVS for networking, block devices and filesystems for storage, systemd for host lifecycle, eBPF and perf tooling for observability, and device drivers for GPUs, NICs, disks, and accelerators.

QEMU, KVM, libvirt, OVS, container runtimes, Kubernetes nodes, Slurm nodes, and many cloud agents are different layers over the same operating substrate. A serious platform has to understand where Linux ends, where the control plane begins, and which layer owns scheduling, isolation, networking, storage, telemetry, and recovery.

Virtualization and Bare Metal Are Product Boundaries

A VM is not only a technical isolation mechanism. It is a product boundary that customers understand: a shape, price, lifecycle, SLA, image, network interface, disk, console, log stream, and support contract. Once you expose that boundary, you need to decide what customers can control and what the platform owns.

Bare metal removes some virtualization overhead and exposes hardware directly, which can be valuable for GPUs, NICs, storage, low-latency systems, compliance, or customers who want full machine ownership. It also removes many guardrails. Firmware, BMC access, disk state, PXE or image boot, secure wipe, hardware failure, rack locality, network ports, and support workflows become part of the product.

The Ecosystem Does Not Exist Until You Build It

A bare metal or VM product needs more than a scheduler. It needs a way to discover inventory, classify hardware, allocate capacity, install or boot hosts, manage firmware and images, attach networking and storage, recover failed machines, rotate credentials, meter usage, and explain charges. Without that system, every customer request becomes a manual operations ticket.

This is why hyperscalers became ecosystems. They did not only rent servers. They built APIs, instance types, identity, VPCs, volumes, snapshots, images, placement, autoscaling, tagging, metering, billing, support, quotas, and eventually managed services on top of the compute substrate.

Billing Is Architecture

Usage records are not an afterthought. The billing model decides what the system must observe: allocated time, running time, reserved capacity, GPU class, CPU and memory shape, storage size, network egress, snapshots, IPs, licenses, support tier, or marketplace fees. If the platform cannot meter the thing customers believe they bought, revenue leaks or disputes become operational work.

Good infrastructure products connect inventory, allocation, identity, metering, invoices, support, and lifecycle state. That is what makes cloud platforms feel like products instead of a pile of hosts, and it is why products built above EC2, GCE, and Azure VMs can become major revenue streams.

Comparison

What You Have to Build to Make Bare Metal or VMs a Product

The core work is not only creating machines. It is creating the control plane, lifecycle system, and commercial evidence needed for customers to trust, operate, and pay for them.

Layer	What It Must Do	Why It Matters
Inventory	Track hosts, parts, GPU class, NICs, disks, rack, power, firmware, health, and ownership.	You cannot allocate, repair, price, or support capacity you cannot describe.
Provisioning	Install, image, boot, configure, validate, and hand off bare metal or VM instances repeatedly.	Manual provisioning turns every customer launch into a services engagement.
Lifecycle	Create, start, stop, resize, rebuild, drain, recover, wipe, retire, and reassign resources.	Customers buy an operating model, not just access to a machine.
Networking	Attach IPs, VLANs, VPC-like boundaries, firewalls, SR-IOV, RDMA, load balancing, DNS, and routing policy.	Most customer failures show up as reachability, performance, isolation, or policy problems.
Storage	Attach disks, images, snapshots, shared filesystems, object storage, backup, and secure deletion.	Compute products become useful when state and recovery are predictable.
Metering and billing	Record usage, reservations, support tiers, credits, quotas, limits, invoices, and revenue reporting.	A platform cannot scale commercially if finance and operations disagree about consumption.
Higher-level products	Host Kubernetes, Slurm, databases, AI serving, build systems, notebooks, and managed services on the substrate.	The base compute layer becomes the launchpad for larger ecosystems and recurring revenue streams.

What to Understand

Bare metal and VMs are not just resources; they are product contracts for lifecycle, isolation, support, pricing, and recovery.
Kubernetes, Slurm, databases, notebooks, and AI products usually sit above a lower compute substrate that still needs ownership.
The missing ecosystem is the hard part: inventory, provisioning, networking, storage, identity, metering, billing, support, and failure recovery.
Virtualization is a product boundary as much as an infrastructure boundary: it affects isolation, upgrades, debugging, billing, support, and what customers believe they are buying.
QEMU/KVM and libvirt give broad VM flexibility, but the diagram only works in production when CPU topology, memory backing, PCIe devices, OVS networking, image lifecycle, and host maintenance are explicit.
KubeVirt lets VMs live inside Kubernetes workflows, which is useful when teams need VM semantics with Kubernetes scheduling, GitOps, policy, service discovery, and operator patterns.
Firecracker-style microVMs are attractive for fast, strongly isolated CPU and memory workloads, but they do not provide QEMU-style VFIO passthrough and should not be treated as the answer for direct GPU or NIC ownership.
GPU and NIC passthrough belongs in the QEMU/KVM, libvirt, or bare-metal decision path; it can preserve performance, but it moves risk into firmware, IOMMU groups, NUMA placement, reset behavior, live migration expectations, and failure recovery.
Full passthrough outside Kubernetes can be the right answer when the workload needs simpler device ownership, lower orchestration overhead, predictable PCIe locality, direct scheduler control, or fewer abstraction layers between the job and hardware.
Slurm can live outside Kubernetes for classic supercomputing scheduling, inside Kubernetes through integration layers, or beside Kubernetes as a separate control plane; the right choice depends on job shape, tenancy, accounting, operator model, and how much Kubernetes should own.
Nested control planes can fight each other. A VM manager, Kubernetes, KubeVirt, batch scheduler, model server, network plugin, and storage client may all believe they own placement and recovery.
The right abstraction depends on the workload: developer sandbox, tenant VM, microVM job runner, bare-metal training node, edge appliance, or long-lived service host.
Isolation should be explicit across identity, data, credentials, topology, logs, management-plane visibility, and who can reach the guest or host control socket.

KubeVirt virtual machine running on Kubernetes

Common Failure Modes

The team builds a scheduler but not provisioning, repair, billing, quota, support evidence, or secure deprovisioning.
VMs are exposed as a product before image lifecycle, networking, storage, metering, and failure recovery are repeatable.
Bare metal becomes a manual operations queue because firmware, BMC, disk wipe, inventory, and validation loops are not automated.
Billing is added late, so usage records do not match customer contracts, resource ownership, or finance reporting.
The VM boundary hides topology and makes performance bugs look like application bugs instead of host, PCIe, NUMA, network, or storage placement issues.
Teams adopt KubeVirt because Kubernetes is familiar, then forget that VM boot, image import, guest agents, migration, storage classes, and node drains have different failure behavior than pods.
Kubernetes is forced into the control path for workloads that would be easier to operate with full passthrough, direct host ownership, or a Slurm-first scheduling model.
Slurm and Kubernetes both try to own placement, accounting, lifecycle, or recovery without a clear boundary for who schedules GPUs, drains nodes, handles failed jobs, and reports utilization.
Firecracker or microVM isolation is treated as a drop-in answer for GPU or NIC workloads even though it is not a VFIO passthrough runtime and its networking, storage, debugging, snapshotting, and observability paths have different constraints.
GPU reset, live migration expectations, or host drain behavior are not tested before customer workloads depend on them.
NUMA, hugepages, CPU pinning, interrupt placement, tap devices, OVS bridges, and storage paths are left to defaults on latency-sensitive workloads.
Isolation requirements are discussed after the architecture already assumes shared hosts, broad credentials, privileged agents, or a control plane with too much reach.

What Good Looks Like

Every machine or VM has clear owner, lifecycle state, image, network, storage, health, metering, billing, and support evidence.
Customers can consume repeatable shapes through APIs or workflows instead of one-off operator actions.
The platform can host higher-level products like Kubernetes, Slurm, AI serving, and databases without losing track of cost, ownership, and failure domains.
The design states which runtime owns which job: QEMU/libvirt for flexible VMs and passthrough, KubeVirt for Kubernetes-native VM lifecycle, Firecracker-style microVMs for fast isolated non-VFIO units, Slurm for supercomputing-style batch work, and bare metal when abstraction adds risk.
Full passthrough paths are documented as a first-class option when direct device ownership, deterministic locality, and simpler failure recovery matter more than Kubernetes-native lifecycle control.
Slurm and Kubernetes boundaries are explicit: which system schedules jobs, owns quotas and accounting, handles node drains, exposes logs, and reports GPU utilization.
Topology-sensitive settings are visible and testable: CPU layout, memory backing, PCIe devices, NIC queues, OVS paths, hugepages, storage class, guest image, and node placement.
Kubernetes and VM orchestration boundaries are clear: scheduling, admission, networking, storage, migration, health, rollback, and evidence collection each have an owner.
Operational playbooks cover host maintenance, stuck devices, guest failure, image updates, VM migration limits, node drains, and what evidence to collect before rebooting or rescheduling.
Bare metal remains an explicit option when virtualization would hide hardware behavior, complicate passthrough, or create more customer risk than value.

Quick Diagnostic

What is the unit of sale or operation: bare-metal host, VM shape, reserved capacity, GPU slice, tenant cluster, or managed platform surface?
Can the platform prove one source of truth for inventory, allocation, lifecycle state, metering, billing, support, and ownership?
Which layer owns Linux host configuration, firmware and BMC state, images, networking, storage, identity, quota, repair, and deprovisioning?
Which runtime is responsible for the workload: QEMU/libvirt VM, KubeVirt VM, Firecracker-style non-VFIO microVM, container, Slurm node, or bare metal?

5 more in private context

Evidence to Look For

Inventory schema covering host class, GPU, NIC, disk, rack, power, firmware, BMC, health, lifecycle state, and customer or tenant ownership.
Provisioning and image-lifecycle record showing boot path, OS image, kernel, drivers, network attachment, storage attachment, validation checks, and handoff state.
Metering event model that maps resource allocation and runtime state to invoices, credits, quotas, alerts, support records, and revenue reporting.
Guest CPU layout, memory backing, hugepage use, device locality, NIC queues, OVS bridge path, storage class, and guest image notes.

5 more in private context

Protected Preview

Control-plane schema reviews for inventory, allocation, lifecycle, billing, and support state.
Metering and billing model walkthroughs for bare metal, VM, storage, network, support, and reservation products.
Host lifecycle runbooks for provisioning, firmware, BMC, secure wipe, repair, drains, and reassignment.
QEMU/KVM/libvirt topology review examples.

5 more in private context

Virtualization stack diagram for VM architecture

Further Resources

AI InfrastructureUse this to decide where virtualization fits in the larger AI platform.Kubernetes & GitOpsUse this for the platform layer that often runs on top of bare metal and VM substrates.On-Prem AIUse this for enterprise deployment and isolation constraints.Rust Systems AutomationUse this for host validation and operator tooling around VM fleets.Hermetic Build WorkflowsUse this for repeatable tooling, host configuration, and validation workflows around platform operations.

Apply to a Decision

Apply this to a product, infrastructure, or diligence decision.

If this resource matches a decision you need to make, these services turn the framework into a review, roadmap, validation plan, or risk assessment for a specific environment.

Hardware InfrastructureDesign VM, bare-metal, lifecycle, metering, and isolation paths that match the platform business model.Engineering LeadershipUse Staff Engineer judgment before compute substrate decisions harden into product constraints.

Private Resources

Host configs, topology diagrams, passthrough notes, customer isolation reviews, billing schemas, and lifecycle runbooks stay in the protected area.

View Private Resources